Is Forces For Good Flawed?

A colleague of my made the argument to me yesterday that Forces for Good was flawed because the twelve organizations from which the Six Practices of High Impact Nonprofits was derived have not been proven to have achieved impact. This is true. The methodology used by Crutchfield and Grant did not employ double blind studies or even similarly rigorous analysis of the selected nonprofits’ programs effectiveness. Instead, the authors used peer surveys, expert interviews and the authors own in depth research. Why didn’t the Crutchfield and Grant simply use twelve proven high impact nonprofits as their research set?

Because no one can find them.

Philanthropy loves to talk about proven high impact nonprofits. We love to talk about “programs that work”. But the fact is, very, very few nonprofits have ever gone through extensive analysis that has proven that their programs have impact. Have you ever noticed that I talk about Nurse Family Partnership a lot? That’s because it is essentially in a class by itself. Other proven high impact nonprofits exist, but you’d have a hard time getting a consensus on twelve that Crutchfield and Grant could have used in their book.

Proven high impact nonprofits appear in lots of theories of effective philanthropy. But they aren’t seen very often in the wild.

So we have two options. We either spend our time running studies trying to prove whether nonprofit programs actually work or we follow the lead of Crutchfield and Grant and create a working model based on a set of plausible assumptions and get to work building the nonprofit organizations that we think are best positioned to create programs that will one day be proven to work.

Of course we still need to do evaluation. Of course, we need to look at claimed program effectiveness with a skeptical eye. Of course we need to demand that nonprofits constantly evaluate their programs in the most rigorous way to determine if they are working. Of course we need to fund outside evaluators to do the same.

But we also need to get on with the business of building a world class ecosystem of high performance nonprofits.

So I agree with my colleague’s critique. Forces for Good’s set of studied nonprofits have not been proven to be high impact organizations. The set of best practices distilled from those organizations might be incorrect if it turns out that some of the selected nonprofits’ programs were not particularly effective.

But guess what? Warren Buffett doesn’t run double blind studies to see if the companies he’s invested in have “proven” profit centers. We happen to know that lots of for-profit companies looked like they were doing just dandy until last year. Their profit centers turned out to be a sham, based on false assumptions and they blew up.

Warren Buffet was invested in some of those unproven companies.

He’s still the richest person in the world and he’s not changing the way he invests.

I wish we lived in a world populated with proven high impact organizations. I wish we lived in a world with proven stock trading systems that could guarantee a market beating rate of return.

But we don’t. Let’s get on with the business of philanthropy. Philanthropy’s job is providing capital to high performance nonprofits and refusing to fund low performing nonprofits. It is our job to invest in the best possible organizations and provide them the resources they need to grow and enhance their impact. It is a messy business. We’ll never know for sure if we’re backing the right organizations. But that’s the hand we’ve been dealt.

Even when we do “prove” that a program works, we’re not talking about proof of a mathematical or physical property. Proof in nonprofit effectiveness means that rigorous studies have been performed and that should give us a lot of confidence. But even a quick search of Google returns stories like this one on natural-selection studies:

Genetics in Japan have demonstrated that several statistical methods commonly used by biologists to detect natural selection at the molecular level tend to produce incorrect results. "Our finding means that hundreds of published studies on natural selection may have drawn incorrect conclusions."

We live in a world of uncertainty. The way forward is not to demand proof before moving. It is to getting moving this very second while at the same time constantly striving to check our assumptions. And when we find our beliefs were incorrect? Turn on a dime and barrel forward on the newest most probably course.


  1. Jacob M. says:

    From my personal experience in the nonprofit sector, I’d have to disagree with Sean as having high-performance as the starting point for funders.

    Certainly no one would disagree that one would want to look for the intersection of high-performance and high-impact. But outside of that intersection, there’s still value in helping in nonprofits with less than stellar performance, but proven program models develop stronger organizations. Ultimately it’s an empirical question, but my belief is that it’s easier to make investments to improve performance in moderately performing organizations than it is to help effective organizations adopt/create higher impact program models. Sean’s point that there are few “proven models” out there emphasizes the need to improve the performance of the few organizations out there with a decent evidence base, rather than relying on high performing organizations to generate their own. And funders can play a crucial role through providing resources that help nonprofits improve decision-making, talent recruitment and retention, and other key ingredients to performance. Moreover, these investments are discrete, incremental, and tangible one that funders can easily make and hold themselves accountable for.

    I think the D.A.R.E. program is a great example of the danger of starting with high performance. I won’t go into detail, but D.A.R.E. would probably score pretty well on all of the “Forces for Good” criteria laid out in previous posts. Indeed, D.A.R.E.’s rapid diffusion in spite of fairly robust set of evidence suggesting little impact has been a subject of heated debate among funders. And as organizational sociology and the D.A.R.E. experience teach us, people have strong commitments to institutions (even failing ones) and will fight tooth and nail to preserve them – even if that means squandering resources and ultimately forsaking impact.

    Maybe there are a few exceptions (such as funding new nonprofits/program models that haven’t had a chance to demonstrate impact), but overall I think the safer bet for funders is to screen for impact and then performance. After all, we know many of the steps one can take to build stronger organizations; however, the ingredients required for impact is still an elusive recipe for many nonprofits, including the strongest of organizations.

  2. Thanks for your recent series of posts. Couldn’t agree more!

    Innovation literature and practice have shown that frequent and rapid adjustments, based on BAD (Best Available Data) based-assumptions, will lead to the most deliberate and efficient progress.

    Another way of looking at it is organizations are nearly always better off DOING something rather than not, so long as they are thoughtful before taking action and use the results to drive lessons for their next action.

    Scott Bechtler-Levin

  3. Thanks for the excellent comment Jacob. Obviously I disagree with your last paragraph. I think it is nonprofits who are best positioned to “screen for impact”, but only if they are high performing organizations.

    That being said, I agree with a number of your points. Should we only fund high performing orgs or is there room for grants that help average orgs become great? I think there’s room for both, but I think that high performing orgs currently do not get anywhere close to the advantage they deserving in looking for funding. Much more should go to these orgs than currently does.

    However, I also believe fully that high performance is defined differently depending of the maturity of an org. A startup might just show great promise of being high performance. A very mature org may have once been great, fallen on hard times and need to turn around. These are all great opportunities. But I think they thrust of funding should be providing capital to high performing orgs.

    Separately, your point about DARE is important. It seems from the comments that some people have taken my posts to imply that I don’t believe that outcomes need to be evaluated. That a high performing org is great even if it is running a worthless program. That’s not what I’ve meant to communicate and in my next post I’ll offer a critical assumption (or demand) that is required for high performance to lead to high impact.

  4. Sean,
    Thanks for referencing our book, Forces for Good, as an example of both high-performing and high-impact nonprofits. I wanted to weigh in on the debate, and the larger challenges of evaluation in our sector. First, a caveat – neither Leslie or I are “evaluation experts,” which is perhaps why we approached our methodology so pragmatically. We took both a “bottom-up” crowdsourcing approach (asking thousands of people in various fields to rate their peers and tell us who was high-impact), combined with a “top-down” approach, asking experts in these fields (including funders and academics) to help rank these organizations based on their deep knowledge of the field and various players. We also looked at whatever self-reported data we could find, and examined the evidence of “impact,” for each of the organizations we ended up writing about. They all had to meet a basic test of proving sustained results (outputs, outcomes, and impact to extent measurable), and larger “systemic change” as well (changing policy, markets, fields, or whole systems). But each organization measured these things differently, not surprisingly – and they were at different stages in their evolution along this path.
    Whether or not they are “high-performing” is debatable, depending on how you define the term: they all have had impact, but not all are perfectly managed organizations. Which is why Leslie and I have rejected emphasizing metrics that focus only on internal measurements, like overhead ratios, and focused more on the question of what are they achieving against their mission, and the larger social good. The six practices we discovered really have to do with paths to scaling impact and achieving greater results (broadly defined), more than internal operational management or performance. I do think the sector could do more to emulate these nonprofits’ approach illustrated in the practice of “Adaptation”: using data (often real-time-feedback, not long term control group studies) as a method for constantly improving performance (and therefore impact), rather than seeing it merely as a tool for reporting to funders.
    Of course, our first challenge in researching and writing our book was defining “impact” and then selecting organizations that met *our* criteria. None of these 12 nonprofits had “proven” impact according to a control-group study. We used a ‘grounded methodology’ (combined with selection above) to determine which non profits were considered to have the highest impact within each field. As we went through this process, we learned just how challenging this question of defining impact in the social sector truly is, for numerous reasons:
    – One: There is no single metric like “shareholder value” in our field which is an agreed-upon proxy for impact (bear in mind even in the for-profit sector there are many other ratios/ indicators that investors look at to assess organizational performance)
    – Two: There’s the challenge of comparing apples to oranges: what you’re measuring as outcomes/impact in education (closing the achievement gap, graduation rates, etc.) is quite different from what you’d measure in the environmental field (reduced carbon emissions, protected species, etc.) – in fact, the outcomes will vary widely field to field, from organization to organization, which makes the quest for a standard of proof, or a single metric, meaningless
    – Three: Not all types of “impact” lend themselves to quantification in the first place; some types of social impact are intrinsically qualitative; e.g. you can measure the number of people who attend a symphony (an output), and overall membership rates over time, but how can you value the contribution of the arts to society? The impact of hearing well-performed music on individuals’ lives?
    – Four: The time-scale it takes to achieve social change is often not easily measured in quarters or years, but in decades; so funders seeking “quick wins” will often shut down a program too soon; we know from evidence that programs like Head Start make a meaningful difference in kids lives, but often the real benefits don’t show up until much later in life. Or, funders seeking to build social movements will need to invest for the longer term – the Civil Rights Movement was decades in the making before it reached real impact (changing laws and public behaviors to increase racial equality). So what is the right timeframe to measure impact?
    – Fifth: Causality is often difficult to assign in the social sector. We know from experience that nonprofits are often working on multi-variable problems (e.g. children in poverty), and that their intervention might only address only one piece of the problem (e.g. feeding them via Kid’s Cafes; helping them learn to read; mentoring them, etc.) So, while you can measure outputs and possibly outcomes, very few of these programs are solving the entirety of the problem – and many other programs are also working with these kids. Who then gets credit if an impoverished child who participated in various programs goes on to Harvard? And, we also know – with increasing evidence – that true systemic change requires collaboration: working in coalitions, alliances, addressing the problem at the level of the whole system via networked approaches. So in these cases, which organization gets credit for the “impact”? Our current evaluation methods do little to address this problem of collective action and assigning causality.
    – Sixth and last: One of the biggest ah-has from our research was that the best nonprofits often don’t focus merely on direct-impact, or measurable outcomes, but rather, they focus on larger systemic change– what we call “indirect impact” at the Monitor Institute. Over time, the highest-impact nonprofits start trying to address larger systems failures, and get to root causes or more sustainable long-term solutions than the band-aids that many social service programs provide. In other words: they end up advocating to change public policy (the Civil Rights Act; more funding for education); or they end up correcting for market imperfections and externalities by reducing the negative aspects of business (Environmental Defense Fund’s partnership with WalMart), or they seek to change public behavior on a large scale (MADD).

    – Most nonprofits exist in the first place because of failure by markets or by government. And therefore, to be truly successful, they often move from providing services to actually seeking to influence whole systems. In this case, the “impact” they are having cannot be measured directly by their program’s output (number of teachers trained and placed in Teach for America’s case), or even their organization’s performance (being well-managed and highly data-driven)…Rather, Teach for America’s largest impact is arguably in the alumni network it has built, and what those alumni are now doing with their lives to transform public education in America. So Teach for America now has this in their theory of change, and they now track what their alumni do, and they actively provide programming around building this network: yet, none of this would lend itself to conventional evaluation methods, and that’s my point. Changing policy, or changing business, or building networks or social movements doesn’t seem to count as “impact” according to many of the methods we’ve set up for evaluating success: most current methods stop at the level of the program, or the organization—all of which are necessary, but not sufficient, and ultimately means to a larger end.
    In conclusion – and I realize this is much longer than what I’d first intended to write – I’m not saying we shouldn’t evaluate: we should. (And I’m not an expert who can elucidate the pros and cons of various methods.) But we should do so thoughtfully, and weighing all of the challenges and complexities I’ve outlined above – recognizing that a one-size-fits-all approach won’t possibly work for the diversity of the social sector, and that we need a fairly broad and somewhat flexible definition of impact. I do advocate for moving away from a system where we evaluate nonprofits on arcane internal metrics like their “overhead ratios” which do nothing to illustrate the return on a donor’s investment, or the long term outcomes. We should focus much more on “impact”—even more than on performance (if the latter is overly focused on organizational perfection, rather than social change). We should do our best to clarify logic models, and elucidate theories of change, and measure what we can measure, and seek leading indicators, and try to assign some causality where we can. And we should try to steer funding to these “high-impact” nonprofits rather than letting decisions be only guided by conventional internal metrics, or feel-good stories.
    But we shouldn’t become prisoner to these tools and methods either, or let the perfect become the enemy of the good. There’s a danger in becoming too obsessed with the data and tools, as well-intentioned as we might be; or using the wrong tool in the wrong context, and missing the larger picture. Sometimes philanthropy requires a leap of faith; and smart investing requires being thoughtful and weighing multiple factors, and understanding the context in which these nonprofits operate, and exercising judgment. It’s this that will ultimately lead to higher-impact nonprofits, and to a higher-performing social sector.

  5. For daily stories and comment on high quality evidence-based prevention and early intervention programs see –

  6. Wonderful and helpful comments. The one item on your that intrigues me is number 2: about not comparing apples and oranges. I hope that doesn’t happen. But certainly organizations working in the same field or addressing the same issues can and should be measured both against outcomes they’ve committed to achieve and how they compare with each other.