This entry to the One Post Challenge comes from Matt (who is withholding his last name). Matt is an employee in the development office of a nonprofit devoted to providing fracture care in developing countries. These are his views, not those of his employer.
By Matt
The Burden of “Burden of Disease”
Working in development, I get frustrated with some foundations’ tendency to have rigid, mathematically determined goals that we simply can’t furnish the data to meet.
Individual people respond well to our campaigns – they intuitively grasp that fractures are a real problem. We can speak to them on an emotional basis, and they don’t feel patronized; we can use anecdotes, and they don’t cry out for footnoted tables. Foundations, on the other hand, seem to have an obsessive need for facts and figures. This is understandable, given their duty to weed out programs that aren’t necessary or cost-effective. What irks me is the extreme that this bean-counting goes to. I’m speaking of the apples-to-oranges comparison known as Disability-Adjusted Life Years – a measurement of mortality and morbidity, or the “burden of disease”. What I want to say is, “Here – we have numbers on how much money we save patients, how much time in the hospital they can avoid, how many patients we’ve served, how many dollars we’ve spent, how many instruments and implants we’ve produced – the only figure we don’t have is DALYs. Pick something else, anything else, in any units – furlongs per nanosecond, I don’t care. No more DALYs.” We simply don’t have that data. We can estimate, extrapolate – we could even prevaricate (I would get fired, but we could). But we don’t have the numbers.
That puts us at a severe disadvantage writing to a large foundation, one that’s never considered injuries to be comparable to AIDS, TB, malaria, and the like. Providing our own metric doesn’t wash. Nobody wants us controlling the terms; that would be akin to allowing OJ to handle the glove. Instead, I’m forced to say things like, “Assuming one surgery can avert at least three DALYs, our methods are more cost-effective than preventing the mother-to-child transmission of HIV.” It’s a flattering comparison, but “assuming” casts it into doubt. I have little choice in the phrasing, though, because it is an assumption.
Measuring the impact of our treatments with any certainty and objectivity would require 1) a precise estimation of a disability weight (difficult to do) and 2) much more data from our patients. Asking them to return to the hospital to submit that data would be, frankly, inexcusable, considering the hardship some of them go through to even get to the hospital in the first place. The only remotely comparable demand in our lives is a summons to jury duty. Imagine being summoned to jury duty hundreds of miles away, over rocky terrain, while you’re working twelve hours a day to feed your family of five, then being greeted by a battery of tests and dismissed without any compensation. We could conceivably do this. By giving them a surgical implant, we create a feeling of gratitude and indebtedness, and we could exploit that. What I want to know is, who in their right mind would think it justifiable to exploit patients’ gratitude, waste their time, waste their doctors’ time, and waste our own funding (paying for the doctors’ time), simply on account of one column of data? When foundations demand DALYs from organizations like mine, that’s what they’re asking us to do.
This may seem like an exaggeration.
1) Disability weights can’t be that hard to guess, can they? Take a look at this table. There’s no differentiation between treated and untreated broken bones. There’s no definition of “short term” or “long term.” There’s no “long term” data for many fractures. And this is only the start of the problem.
2) What more data do you need? We know that we prevent treatment by traction or amputation. What we don’t know is the ratio of those treatments or the increase in DALYs averted due to the change to our treatment. There’s no DALY weight for traction; it’s not an illness. If we had data on the percentage of traction cases that end in failure – malunion, nonunion, gangrene – perhaps we could calculate DALYs for those conditions, right? Nope, there’s no values for them. Well, what about amputation? The difference between “treated femur fracture, long-term” and “leg amputation, long-term”: .028. Essentially none. This is no doubt due to “treated” being a misnomer in the data table. So, what can we use instead of that weight? Data from the patient. If they report normal leg function, for example, that’s tantamount to zero disability. If they don’t – well, we’re back to guesswork, but at least we have data to work with.
3) Why must patients return in person? Ask someone how they’re doing, and they’ll reply “fine.” Ask a patient how their fracture has healed, and they’ll reply “fine” – even if your questions are so precise as to demand a numerical response, no questionnaire will be accurate. They need to be examined in person for the data to be objective.
4) Say you do this only in urban areas, where patients can return without too much trouble – wouldn’t that work? In a certain sense. We would be able to estimate DALYs averted on some basis – a skewed basis that we couldn’t justify using for rural patients, who have different injury patterns and post-recovery activity. Further problems that would arise: it would still be an estimate. It would still be unreliable. It would still cost time and money to collect. And it would still be on the foundations’ terms. Even in the very best case, we can’t change the basis of the argument to something rational, because foundations don’t accept that DALYs aren’t a nice little simulacrum of reality. We’re stuck within that framework until foundations realize that DALYs are the wrong measurement for complex conditions.
10 Comments
Thanks for the opportunity to write here. I have to say that it’s intimidating to be following the post by “a fundraiser,” since it’s swept the competition.
As a side note, I’m slightly curious what “a fundraiser” is going to do now that Pride at Work has effectively won his or her mini-competition, since they’re ineligible for Network for Good donations. It’ll be interesting to see. He or she may be out $500 of their own money.
You don’t exaggerate the wackiness of funders when it comes to their expectations about metrics and outcomes.
I think the knee-jerk response is a gentic tic activated by too many boring philanthropy conferences.
Last week, for example, I had to wipe spittle off my brow when a funder colleague ranted about metrics and impact of a particular program supported by a particular funder collaborative.
“We don’t have any data to show X so nobody will want to fund us.”
She assumed that the “X” was the be all and end all of defining program outcomes.
Before her self-satisfied glow could cause further irritation to other members, we reminded her that there were many other measurable outcomes of the program.
We learned from the experience that we needed to do a better job of communicating the complexity of the program the challenges of its evaluation and the importance of many positive outcomes beyond the magic X.
Philanthropoids often only see what they are looking for.
“Not everything that counts can be counted, and not everything that can be counted counts.” (Sign hanging in Einstein’s office at Princeton)
What is the “X” that lets you know that Casablanca is a better movie than Look Who’s Talking, Too? Or that Star Wars is better than Bio-Dome with Pauly Shore?
I wrote a long post about this issue back in March that you can find here.
Matt, I wanted to respond to your post. Although Pride At Work has a 501 (c)(4) status with the IRS, we do have a 501(c)(3) fiscal sponsor, the LGBT Labor Leadership Initiative, which allows us to receive tax-deductible donations. The vast majority of the advocacy and education work we do for LGBT working people falls into this category. Check us out at http://www.prideatwork.org. Feel free to email me if you have any other questions. Thanks.
Matt,
I don’t have it in front of me, so my recollection may be faulty, but I think Amartya Sen, the Nobel laureate economist, made a compelling argument that DALYs are the wrong measure entirely, that instead the focus should be on what kind of real capabilities people in a society have, in his book “Development and Freedom.” Sounds like the issue with fractures goes right to the capabilities of the person who is treated, and the impact it has on their abilities and life chances, rather than morbidity. If you haven’t read it, his arguments might be helpful.
Thanks for your post.
Thanks very much for explaining, Jo! I was working from the (faulty) assumption that a 501c4 nonprofit would be out of luck, due to Network for Good’s policy. It would’ve been disappointing for all concerned to see a 501c3 charity with, say, 3 votes take all the benefit of Pride at Work’s 60 supporters.
Sean, the movie analogy confuses me. There are plenty of X’s (some completely arbitrary) that will separate the very best from the very worst, at least in terms of movies. Even if you throw out things like aggregated scores from Metacritic and Rotten Tomatoes for being essentially collections of opinions, how about tickets sold divided by seats available (or screens available times median theater size, if you don’t want to count the actual seats)? Sure, that reflects popularity, not merit (although the metric for this competition claims that those things are synonymous), but you get the idea. Where metrics for subjective subjects get really interesting is at the very top of the continuum – ask a Brando fan to decide between Streetcar and On the Waterfront, or a comic book geek to decide between Spider-Man II and Sin City. Or, as it applies to nonprofits, ask a foundation to pick between saving a child from contracting malaria or saving two adults from having their legs amputated. This is the point where foundations should be asking about the precision and accuracy of the metrics they ask for, and looking into the uncertainty that creeps in at each step. When they don’t, it becomes a game of which methods a grantee can use to inflate their numerical results, and the ethical applicants lose out.
Am I wrong? I might be. I’m new to this field. Please point out the errors in my thought process; I’m willing to learn.
Matt, aggregated scores is the quantification of a qualitative metric. The reviewers assign a number to their judgment of how good a movie is, yet they can’t actual “measure” how good it is. My point is that they do not use metrics like “number of actors per scene”, or even “ratio of good dialog to bad dialog” they just think about the movie on a qualitative basis.
A movie like Saving Private Ryan is “good” in a different way than Knocked Up is “good” (both critically acclaimed). Just like saving the whales is good in a different way from teaching children to read. Yet we still as a society have a generally agreement on which movies are good and which aren’t. We don’t even have that basic level of agreement with nonprofits. Ask any person on the street to name 5 great nonprofits and they can only name nonprofits that they have personal experience with or big well known (but not alway “good” nonprofits). Try the same question with for-profit companies and you’ll get names like Starbucks, Apple, and Google again and again).
No one claims that the best selling movies are the “best” because we generally do not equate popularity with quality.
So what about the metric for winning this contest. The point of this contest was not to find the “best idea”, it was to “encourage the philanthropy blog conversation” and to “engage in conversation”. I wanted to use an objective, transparent metric and so I decided that the best way to measure the extent to which a post encouraged a conversation was to measure the number of participants in that conversation.
Not a perfect metric, but an OK one. Think about some sort of rally (like a protest rally), the number of participants might not be the best metric for how much impact the rally had, but it is a decent one (and often used). That at least was my thought process.
Aggregated movie scores are actually a really good analogy for how nonprofit performance is often measured. Look at the DALY weights: subjective. The methods used to arrive at them are, quite literally, aggregations of qualitative measurements (if somebody wants to detail the time trade-off and standard gamble methods here, I’d be obliged; suffice to say, they’re not objective). Any answer to the question “how debilitating is this disability” will likewise have an element of subjectivity. It’s an utterly subjective question.
The difference between Rotten Tomatoes rating movies and nonprofits calculating DALYs – and one of the points I’m trying to make – is that Rotten Tomatoes does its aggregation and calculation in a standardized way, the same every time. And this enables consumers to choose between movies that are nearly as good, purely based on a number – realizing that there’s going to be some error, and knowing where the errors are from. But DALYs are subject to variations in methods at each stage of calculation, which is what makes the upper tier of cost-effective applicants so interesting – and potentially so misleading. That’s why I’d prefer foundations avoid the pitfall of pretending DALYs are a catch-all solution to the measurement problem.
Matt,
Great discussion. I admit that there are some flaws in the movie analogy (Holden Karnofsky has outlined a number of them, particularly the fact that the viewer of the movie is the customer, whereas donors do not “experience” the product, which instead for nonprofits is delivered to someone else).
But I think you’re missing my point with DALYs. Movie ratings are not just qualitative, they rate the movie as a whole rather than aggregating sub-metrics which are suppose to be elements of what make a movie good.
The foundations you cite are not trying to impact DALYs, they are trying to improve quality of life. Your nonprofit improves quality of life without impacting DALYs. That’s the problem.
A good movie can be great with no award winning actors. Yet potential movie goers sometimes will see a movie because it has a great actor in it. The existence of the great actor can be an indicator of a good movie. But there are LOTS of great movies without great actors.
My point is that focusing on a metric like DALYs can make a funder miss the whole point. They are trying to fund great nonprofits that further their mission, but impact metrics.
My wish is that we focused on rating nonprofits and funneling money towards nonprofits which generate impact, but not get lost in quantitative metrics.
Someone asked what charities can be supported via the Good Card. The answer is more than one million — any 501c3 that is in good standing with the IRS.