Rebooting Nonprofit Evaluation Debate

A lively debate about nonprofit evaluation and metrics has been raging in response to my request for input on my meeting later this week with However, the conversation has splintered into a debate over whether a systematic, “metric” driven process of scientific measurement is needed, or whether the frame of scientific measurement is “an epistemologically impoverished frame” through which to understand nonprofit evaluation.

I personally believe evaluating nonprofits is mostly about evaluating their output (the social good they produce). Since it is difficult (impossible?) to quantify this output, I think the focus on metrics as a framework for evaluation is misplaced. Metrics can be used, but they should be designed on a case-by-case basis for each situation. That being said, I think the conversation has fallen into the trap of being constrained by historical frames of reference.

I want to have a different conversation.

I’m interested in what information is available to donors who want to evaluate a nonprofit and which of this information is useful. is mostly a resource that points to information; they don’t tend to create a lot of their own content. So if we imagine a future version of the nonprofit data inside of Google Finance, I don’t imagine it will be some new metric that we design. Instead, it will point to existing information on the web. When I first wrote about nonprofit info in Google Finance, I said I hoped they would not display Charity Navigator ratings (although I would support them noting if a nonprofit had a zero or one star rating since I do believe that a Charity Navigator rating at this level is a significant red flag)

So the conversation I want to have is what information do readers think that donors should consider when evaluating a nonprofit? Then secondly, where or how can this information be captured online so that it can be displayed in Google Finance?

Open Invitation to Foundation Employees

I realize that if you work at a foundation, you may not want to jump into a conversation that involves telling another foundation what to do. However, the conversation we’re having here is really important and would not be complete without the input of the army of program officers (ie. Nonprofit evaluators) that read this blog. So please consider commenting anonymously (just let us know you’re a program officer) or comment publicly and realize that we’re having a broad conversation about nonprofit evaluation that goes beyond and Google Finance

Open Invitation to Nonprofit Employees

A conversation about nonprofit evaluation would not be complete without the input of the nonprofits being evaluated. What information do you, as nonprofits, what donors looking at when they evaluate you? It could be that someday the Google Finance website about your organization becomes the top ranked search result on google for your nonprofit. What do you want on that page?


  1. young staffer says:

    You spoke too qucikly Sean – depending on the mission, the number of volunteer hours and the number of volunteers actually can be a direct sign of effectiveness. If, for example, community organizing or community building is a central activity the number of people who voluntarily contribute hours does demonstrate effectiveness. And think about “Forces for Good” and the emphasis that model of effectiveness puts on creating volunteers who are evangelists for an organizations and advocates for a cause. It’s not a perfect metric (I’d want to know volunteer turnover and event attendance too), but it is a metric of effectiveness for some organizations.

  2. Good point Young Staffer. If community building is the mission, than clearly volunteer work is critical. I do think that volunteers are a key to being effective, but are not an indicator of effectiveness. In other words if you want to be effective, get evangelical volunteers are advocates (a la Forces for Good) is a very good idea. But if you look at an organization and see that they have lots of volunteers, it does not mean they are effective.

    But as I said in my comment, “David, I really like your idea of volunteer hours. I continue to think that there is no one perfect metric, but the commitment level of an organization’s volunteers is part of the whole story.”

    I’ll see if I can get the Forces for Good authors to weigh in here.

  3. young staffer says:

    Expense ratios are useful for one thing: finding the most abusive charities. What I resent about Charity Navigator is that they use it to supposedly identify the best (the four star vs. the three star) and not just the worst – they claim to be an “evaluator,” not just a watchdog.

    There are two problems with expense ratios and the broader Charity Navigator methodology: 1) accounting in the sector is not standardized and this invites gaming – a fixable problem, I admit and MORE IMPORTANTLY: 2) it perpetuates a giving market defined by one type of efficiency and not effectiveness – the result of which is ultimately inefficient.

    It’s the difference between monetary/financial efficiency and economic efficiency, something far too often neglected in our discussions of spending in general.

    A relevant comparison here is the question of whether the charitable tax deduction is efficient. In terms of treasury efficiency, this is the question of whether it helps charities raise more money that it takes away from the government, thereby having more money end up in “public” coffers (nonprofit and government combined) than would have without the deduction. That’s mostly an empirical question.

    In terms of ECONOMIC efficiency, however, the question is whether the charities actually spend the lost tax revenue BETTER than the government would have. Thus, the question of whether or not the charitable tax deduction is actually efficient and effective policy turns not just on how much money its pumps into the social sector, but whether nonprofits spend it better than the government. If they don’t, then we’d be better off giving it to the government. How you make the calculation is much more complicated; it has to do with the values you hold, what the government would do with the extra money, what would happen to the giving in the sector without the deduction, etc. It ultimately has to do with how you evaluate the social benefits of nonprofit sector activities versus government activities.

    That’s the kind of question expense ratios ignore about charities. A charity that can save 5 infants lives for $100 and spends 20% on fundraising is more economically efficient and more effective than one that saves 1 infant for the same $100 and spends 10% on fundraising. Sure, less of your money went to the programming in the first one, but it actually went farther – the social benefit of your dollar is empirically greater. They spent less on programming, but they did more with it. Granted, one would hope that the first charity would figure out how to do their marketing a little better and get larger gifts, but their failure to do so does not change the fact of their superior efficiency. Even as an inefficient fundraiser, your money was better spent on them. GiveWell, as opposed to Charity Navigator, attempted to address that question.

    Knowing that an organization is spending money on programming and raising more (growing) is not at all the same as knowing whether they spend the money on programming well. The end result is an economically inefficient market – one that does not maximize giving to the organizations who are more efficient at turning dollars into social benefit.

    Now, whether or not Charity Navigator has helped or hurt the efficiency of the nonprofit capital market on balance is a question I can’t help you with. I have no idea how it’s actually shaped donor behavior. To the extent that most charities that spend 50% of their budget on spending are probably not economically, monetarily or organizationally more efficient than others and that it has helped donors avoid them, the market is more efficient. But in allowing people to over-simplify the question of efficiency, it can hurt the market.

  4. Phil Steinmeyer says:

    Very well said, Young Staffer.

    I think Charity Navigator has its place in the world, but the problem is when folks start and stop their analysis with CN.

    I want to see a focus on effectiveness, not just efficiency. That’s what GiveWell’s analysis tries to get at. But I think the GiveWell approach is difficult to implement. It requires a very detailed picture of a charity’s operations, one that was apparently difficult to coax from the charities, and this caused further issues.

    When you look at the African Health report, you see that the three charities they rated highest differ most obviously in approach. One is focused on cosmetic surgery (albeit relatively cheap and potentially life-changing surgery like fixing disfigurations). One focuses broadly on diseases (like AIDS and Malaria). Another focuses on health infrastructure (clinics and the like in poor areas).

    But to reach those conclusions, they had to rely on the data supplied by the charities (difficult to get, potentially unreliable), and their own estimates used in the analysis (they try to translate certain raw numbers, like # of bednets distributed, to impact values, like lives saved).

    An alternative approach would be to look at what the academic community has to say about effective interventions in Africa, and then work down from there.

    Basically, instead of looking at raw data from charities, and trying to figure out how effective they really are, one could look at what has been shown to be effective, and then find charities using those techniques.

    This would omit the detailed spreadsheet analysis of each charity, and instead rely on the assumption that, within particular program areas, charities are likely to be roughly equal in effectiveness. i.e. If insecticide-treated bednets are highly effective, and two charities are both distributing them, then they are likely to be roughly as effective as each other. You’d still want some secondary analysis to help out here, including fundraising efficiency measures, and some subjective assessments of a charity’s experience and on-the-ground resources in a particular area. And conclusions would still be imperfect. But I think you’d have about the same degree of accuracy in your final conclusion as with GiveWell’s approach, albeit without the need for the hard-to-obtain detailed spreadsheet data.

  5. Tom Ralser says:

    There are positions on nonprofit metrics for every taste, yet we have found the menu to be so confusing and divisive that many lose their appetite. The discussion on metrics becomes productive only when the conversation shifts from one of being threatened, compared, or evaluated to one of using outcomes (and their values) as a tool. We feel the highest and best use of that tool is in ensuring sustainable funding.

    The bottom line on this discussion is that there will never a universal metric to measure impact, effectiveness, or whatever term one prefers. Can metrics be used improperly? Yes. Can they be manipulated? Yes. Can they be difficult to quantify and communicate? Yes. These possibilities do not outweigh the benefits of a well designed and implemented program of demonstrating a nonprofit’s value.

    Our position in this confusing arena does not come from a philosophical or academic perspective: it comes from the trenches of helping nonprofits get the funding they deserve. Those nonprofits that make it easier for funders to connect the dots to their respective outcomes deserve to have capital flow to them. Those that choose not to go down this path do so at their own financial risk.

  6. Thank you for keeping this thread constructive. It’s very informative. Having reread it, I have some thoughts to share, many of which have already been stated above:

    It is important to keep in mind that everyone is involved with the non-profit sector. In addition to roles we may choose to play as donors, administrators, volunteers, grant makers, etc., most of us are also involved with NPO’s as tax payers. By accepting the rules which accord non-profits a special tax status, we underwrite them and gain the right to an “adequate” evaluation.

    Foundations are NPO’s, although they are not 501C3’s. As NPO’s they get a tax break and are open to public scrutiny.

    Given our interest in NPO’s, we need a system that permits us to evaluate them in terms of their efficiency and effectiveness. Reports (such as, the revised 990) provide some critically important information, but the general consensus is that we need additional tools for gathering and reporting information from the broad spectrum of interested parties about the broad spectrum of NPO’s.

    New technology provides more opportunities to create more adequate evaluation tools. My personal favorite from this thread is the wiki-based idea, with tabs correlating narrative and numerical data.

  7. Joe Beckmann says:

    It’s good this thread has moved beyond metrics, since (a) input metrics are, for most nonprofits, remarkably confused with output and neighborhood benefits, (b) outcome measures are much, much more abstract and, paradoxically, seem so much more concrete than many might infer, and (c) indirect costs, the point of departure and usually the point of no return for such measures, have slipped into remarkable disuse and astounding inconsistency in the past 20 years.

    I’m also sorry that this thread seems to have lost the fervor of the earlier argument on behalf of transparency, which ought to be a standard (and not a metric) for any expression. If we focus a little on other potential standards, then the questions of metrics and reliable, modestly consistent, reasonably comparable and timely reports makes much more sense.

    Before that, however, let me note that I’m surprised no one has raised the MOST important innovation of 990 comparability: they can be filed electronically. That means they could – conceivably – be compared, over time, across agencies, across sectors. That all those data might be compared is much, much more interesting than that they exist idiosyncratically in each filing. Among other things, we could find out which data are useless for anyone, and thereby cut the form considerably; and, conversely, conduct a content analysis to compare narrative descriptions when present, which could, should, or might produce much more generalizable statements at least about sectors (health, education, etc.).

    That said, the real question of metrics is what, if compared, would lead to better decisions – by philanthropists as well as by the NPOs themselves. In a painful discussion with a former politician, for example, I asked which elementary schools sent the highest ratio of failing students to 9th grade in the high school. The only response I got was, “you can’t blame the schools,” when the question I asked was how they could be helped to help their kids better.

    This question of metrics is probably easier to explore in the context of education than many other NPO enterprises. Not only is there the blistering and ugly dialog about No Child Left Behind, but there really are some fairly simple metrics which, once explored, might lead to better reports and greater … transparency. In other words, it probably behooves this discussion to discuss nonprofit sectors before trying to generalize across sectors – a productive day care center is really hard to compare to a productive hospital or a productive Red Cross. And it won’t ever really compete in a 1:1 with either.

    That said, I was struck by one of the Phil’s observations that it’s fairly easy to ask how many teachers teach, students learn, and so forth. That only showed that he didn’t know what schools do. When a school uses a volunteer, for example, the school gains both an input and an output, as well as a fund raiser and political advocate for public funding. As well as a satisfied parent, perhaps, or at least a neutralized nonparent. To measure such a “worker unit” in a single dimension ignores what might, and probably is, a much larger unit of institutional impact.

    If its that hard to measure the simple number of teachers, it’s mind boggling to explore square feet (when or if they take field trips), books (when or if they’re on the net), equipment (when or if they have homework or off-site activities), or even meals.

    It is much easier to track the unobtrusive measures – attendance, tardiness, staff turnover, hours of classtime, perhaps hours of homework, sometimes costs of materials – and to explore what measures might be needed to give a better profile of outcomes. Surely test scores are some of those measures, but many of the unobtrusives are as well – in their delta values, in the degree to which they may change with certain teachers, certain subjects, certain years, given certain conditions.

    With those kinds of data tracked, it is quite conceivable to construct a profile. And, with profiles constructed, we could quite conceivably compare schools, teachers, classrooms, and kids. And we might then begin – just begin – to identify what OTHER measures, either qualitative or quantitative, ought to be collected to make those profiles more useful – to both funders and to the users at all levels.

    Finally, I am really surprised that all this dialog about metrics ignores the largest of all funders – government. If an NGO is productive, it probably has government money as well. And if it has any government money, we might then begin to apply those metrics to government agencies – as you have elsewhere implied an evaluation of philanthropy and foundations. Now, is that to be political or “objective.”


  8. Joe, you do a good job of pointing out how multi-dimensional some measures are (ie. that school volunteers are both inputs and outputs). Maybe “measure” is the wrong word all together. Maybe we should talk about judging nonprofits instead. In court, a judge or jury decides on a case, without pointing to a quantifiable set of measurements. Books and movies are judged without metrics as well.

    But how might we judge nonprofits in this way?

  9. Joe Beckmann says:

    One of the problems of developing metrics in a system which is not a system – but rather a “network” of many mini-systems – is that there are no realistic standards. “Cost-benefit” which works fine for lots of things is infinitely difficult to measure when, for just one example, the actual value of volunteers (one of the costs) is so hard to estimate (if they are big guns do they cost/value more? depends on what they’re doing and what you do about what they’re doing, and so forth).

    A much easier way to derive a “system” which could have some resonance at least among similar agencies in similar fields is to identify “best practices” and suggest what “best outcomes” “best reflect” those practices. This begins to build “rubrics,” and those rubrics can eventually become metrics, once there is enough common understanding of what they mean and how they apply.

    This need not be another library of cases (ala a B-school). It can be little snippets equivalent to the “knols” that the Google people talk about. For example, volunteers as promoters might be a “knol” with a 1 paragraph snippet to summarize what we mean; volunteers as “trainees,” as “assets,” as “Board candidates,” as “mentors,” etc. All of those could congregate in “volunteers as outcomes” which contrasts with “volunteers as teacher supplements” or volunteers as resources.

    In court, a judge and jury dis-aggregates the evidence and synthesizes a judgment based on the sides presented by counsel. This tactic builds rubrics which correspond to the “rules of evidence.” At least it’s a hell of a lot better than a simplification that reflects “spin” more than substance, which is more or less the way it now is.

  10. I think there is never going to be a metric, or even a series of metrics, that will a) fit all social enterprises, and b) fit all donors’ needs. What a museum does, what a soup kitchen does, what a clinic does, what a school does, what an advocacy group does, etc., are so fundamentally different that we simply cannot judge them by any common set of numbers. The important thing, methinks, is for the organizations themselves to say, here are our goals that we measure ourselves by, and here’s how we did.

    The second aspect to this is that different donors (including foundations) will value different things. Perhaps A believes that %-to-admin is the most important, while B looks at volunteer hours leveraged, and C wants to know cost per service unit. XYZ Foundation wants to see new programs developed, and ABC Foundation has their own set of outcome criteria.

    Just as investors in the equity markets will look at different metrics, depending on their strategies and beliefs, so do donors in the nonprofit market. The important thing, I believe, is to give donors access to good and verifiable and comparable data, and let them decide what is important.

    Juries are asked to judge a case based on a single criteria — The Law. I think that private investment, where both institutional and individual investors are all pursuing different strategies, is a closer model to what happens with donors and foundations.

  11. I want to point out to everyone who commented on this thread, that another version of this debate is forming in the comments to my podcast interview with Phil Buchanan of the Center for Effective Philanthropy.

    You can find the new thread here. I’d love to have your input.

  12. Just came across – the info they put on profile pages is pretty good, and well presented. Could this approach be emulated?

  13. Mike Everett-Lane makes an important point – there will never be an overarching set of metrics for all nonprofits. To add to his comment that organizations have to decide what to measure, I would suggest establishing a strong link to the mission statement that focuses on outcomes of programs instead of merely the short-term outputs. Metrics that point to changes in behavior and attitudes present a compelling case that programs are indeed created tangible community/societal benefit, which will attract not only more funding, but also more committed Board members, volunteers, and the like. You can find some instances (Nature Conservancy and Duke Children’s Hospital, to name a few) in our whitepaper at Let’s keep this dialogue going!

  14. So glad this conversation continues.

    I liked the comparison to movie evaluations above, but the problem there is that we rely on a particular reviewer’s expertise as a guide to how good or bad a movie is; the narrative is meaningful and revealing, but the opinion at the core is one we must trust if we are to place any weight in the review. This would be impossible in the nonprofit world, unless we all recognize a core group of gurus who’s opinion will make or break any organization 🙂

    @Sean (w/ apologies for the delayed response): Transparency could be a wonderful marketing tool, in an ideal world. The problem is that it requires critical mass before it’s effective, and a larger critical mass before the dinosaurs will pay attention. If *everyone* was transparent, it would change the rules of the game. Till then, no one wants to be first, because it could backlash…and in traditional marketing models, describing your problems, frustrations, and failures in narrative form is sales suicide. No one wants to be first; they’d much rather write happy blog articles about how amazing and uplifting their work is.

    I like the idea of tightly tying an organization’s measurables to its mission statement, but the more I think about it the more I think a publicly malleable space for narrative text with an intentional community (perhaps slightly less fanatical than Wikipedia) is a fairly good solution. Let the world comment and critique, let the orgs write back, force a dialog…even for the orgs that would much prefer the comfort of their one-way websites.


  15. David Lynn says:

    Met with the local 2-1-1 provider here today, and discussed an interesting wiki or otherwise approach: For any verbal recipients (obviously excludes animal rights and such), how hard to have an established, generic survey about how much the organization helped, sent to the actual recipients of the service? Even if people lie or inflate, the assumption could be that people lie across the board, which would still give you reasonable relative numbers. With an online survey, that can’t be that hard to collect – and I guarantee if it mattered to an NGO’s donors, they would be sure to tell their recipients to go rate.

    Similar to the sites that try to “rate your doctor”, etc, so there are models out there.

    Thanks for the good discussion.


  16. David,

    I like the idea (though organizations working on Digital Divide issues would challenge the ubiquitousness of an online medium). I think the issue would be in finding a “generic” set of questions. Something truly generic to all nonprofit areas of work would be very, very abstract…effectively turning into a “confidence vote” by the org’s constituents, kind of like Amazon’s rating tally.

    It’s also interesting to me (as a web developer) that building this kind of system would be relatively simple, technologically; promoting the platform to gain acceptance would be the difficult part. Thoughts?

  17. David Lynn says:

    – No way we will every compare 100% across categories. That’s like saying you could use just the P/E to compare an early stage biotech with an established brick-and-mortar retailer. Very hard to have a one-metric comparison in that fashion, but with enough metrics, or within categories, then they can be effective.

    – Groups like US News use such self-reporting survey systems to evaluate colleges, for example, so I can’t believe it wouldn’t work for nonprofits. If somebody like Google said “fill out this extensive survey in order to be listed in our directory”, I’m sure many would comply, just like the colleges do. So it becomes more of a matter of establishing a good survey (or surveys for various categories).

    – There have been attempts, such as


  18. Dave, regarding a core group of gurus. I don’t think that is so far fetched. Although the group might be quite large. In investment management there are a core set of analysts. People who work at investment firms and have certain credentials. However, you should note that for the most part, smart investors discount the importance of someones “credentials” and focus on the credibility of their argument. Also, Google’s Knol concept is based on the idea of “experts”.

    I like the idea of an open wiki, but I don’t object to the longer term creation of a community of expert nonprofit analysts.

  19. Dave, regarding “rate your doctor”. It seems to me that Yelp could be a good model. Great Nonprofits is already working on this. I think the “user review” model is part of the equation. But Holden Karnofsky has pointed out correctly that the question is not do donors, volunteers, recipients, or staff like a nonprofit, but how effective the nonprofit is. I’m sure a homeless person would give a 5 star rating to a nonprofit that handed out stacks of cash each week, but that doesn’t mean it is effective. Volunteers and donors can fall into the same trap by giving high ranks to a nonprofit that holds wonderful appreciation events and makes supporters feel all warm and fuzzy when they visit.

  20. Dave, re: transparency. Your view assumes that if nonprofits are transparent, we’ll all see how bad things are inside and won’t want to give to them. I believe that many, many nonprofits are doing wonderful work and are full of dedicate, bright, passionate people. But all their “marketing department approved” messaging bores donors to tears.

    I think the the first movers on transparency would naturally be some of the highest impact organizations (because these are the ones that would believe the most in themselves) and that they would find it to be a huge benefit.

    For a controversial example look at GiveWell. I’m sure some people think GiveWell showed a failure of transparency, but that’s missing the real story. GiveWell was two young guys who consistently admitted that they had all sorts of flaws and regularly said that they didn’t know the answer to numerous issues. They regularly claimed to NOT be experts in their chosen field. Yet they attracted Lucy Bernholz to their board, massive mainstream media coverage and potential funding from the Hewlett Foundation (still under consideration as far as I know).

    Showing people that they didn’t know a lot of things was not their downfall. Their problems came from anti-transparent actions.

    (PS: Personally I’m sick of talking about whether GiveWell was good or not, but I’m happy to discuss what GiveWell’s experiment in transparency means to nonprofits. I think their experiment was an objective success if you measure it by how transparency helped their organization succeed.)