Allegory Child Welfare Clarifications Data Strategy Governance Homelessness Information & Referral Information System Design Interoperability Open Civic Data Performance Measurement Problem Statements Project Management Reviews Social Work Systems Thinking Taxonomies Veterans

Free as in Puppies: Taking on the Community Resource Directory Problem

Last week Code for America (CfA) released Beyond Transparency: Open Data and the Future of Civic Innovation, an anthology of essays about open civic data. The book aims to examine what is needed to build an ecosystem in which open data can become the raw materials to drive more effective decision-making and efficient service delivery, spur economic activity, and empower citizens to take an active role in improving their own communities.

An ecosystem of open data? How might this brave new thing intersect with human service organizations? That’s mostly beyond the scope of CfA’s current book. One chapter, though—“Toward a Community Data Commons” by Greg Bloom—takes a very serious stab at resolving a perennial headache of information and referral efforts.

Bloom is trying to solve what he calls the community resource directory problem. Various players—government agencies, libraries, human service organizations—develop lists of community resources, i.e. programs to which people can be referred. Originally the lists were paper binders. Later they became databases. Almost always, each directory belongs to the agency that created and maintains it. That’s a problem: what the community needs isn’t a bunch of directories, it’s a single unified directory that would allow one-stop shopping. And the overlap among directories is inefficient too: each one has to separately update its information about the same programs.

One solution might be to somehow make directory data free and open to all. Bloom describes one experiment in that direction. It’s instructive because of the way it failed. Open211, which a CfA team built in 2011 in the Bay Area, scraped together directory data from a lot of sources and made it available via the web or mobile phone—and it also allowed the users at large to submit data. As Bloom tells it: This last part was key: Open 211 not only enabled users to create and improve their community’s resource directory data themselves, it was counting on them to so. But the crowd didn’t come. This, it turns out, is precisely what administrators of 211 programs had predicted. In fact, the most effective 211s hire a team of researchers who spend their time calling agencies to solicit and verify their information.

Exactly right. Reading this, I vividly remembered a project I did years ago: updating the Queens Library’s entire directory of immigrant-serving agencies. It took well over a hundred hours of tedious phone work. (There was time on the pavement too… for example, an afternoon wandering around Astoria looking for a Franciscan priest—I did not have his address or last name—who was rumored to be helpful to Portuguese-speaking immigrants. I never found him.) And then each piece of data had to be fashioned and arranged to fit into a consistent format.

That’s what goes into maintaining a high quality community resource directory. It will not just happen. It cannot be crowd-sourced. And this harsh fact—the labor cost of carefully curated information products—can be hard to reconcile with the aspirations of the open civic data movement.

The lesson: it’s certainly possible to collect community resource information and then set it free… but it will be, as Bloom says, free as in puppies. (This wry expression comes from the open source software movement, where people make the distinction between free as in beer—meaning something that can be used gratis—and free as in speech—meaning the liberty to creatively modify. Someone noticed that freely modifiable software might also require significant extra labor to maintain—like puppies offered for free that need to be fed, trained, and cleaned up after.)

But then Bloom takes the problem toward an interesting possible solution. He invokes the idea of a commons—not the libertarian commons of the proverbial tragedy but rather an associational commons that would include the shared resource (in this case, the data) and a formal set of social relationships around it. He suggests that a community data co-op might be an effective organizational framework for producing the common data pool and facilitating its use.

It’s an intriguing idea. It acknowledges the necessary complexity and cost of maintaining a directory. It might be able to leverage communitarian impulses among nonprofits. And if successful, it could be a far more efficient way of working than the current situation of multiple independent and overlapping directories. Of course, it would face all the usual practical difficulties that cooperatives do; but there’s no reason those should be insurmountable.

This framework could solve a lot of current problems in information and referral.  But how might it eventually fit into some larger imagined ecosystem of open data?

Bloom offers a vision for how such a unified directory could be widely used by social workers, librarians, clients, community planners, and emergency managers. That seems entirely feasible, because all those players would need the same kind of information: which programs offer what services when and where.

But Bloom also takes the vision a couple of steps further, imagining a journalist seeing this same data while researching city contracts; and how the directory might be combined with other shared knowledge bases—about the money flowing into and out of these services, about the personal and social outcomes produced by the services, etc.—to be understood and applied by different people in different ways throughout this ecosystem as a collective wisdom about the “State of Our Community.”

This, I think, is where Bloom’s vision will face strong headwinds. I don’t mean political or inter-organizational resistance (though those might crop up too). The problem is that across the broad domain of human service data, very few sub-domains have achieved much clarity or uniformity. Community resource records happen to be one of the more advanced areas. For a couple of decades there’s been a library system standard (MARC21) for storing these records, and now AIRS offers an XSD data standard. So the way is fairly clear toward creating large standardized databases of community resources. Those could then be meshed with, say, the databases of 990 forms that U.S. nonprofits submit to the Internal Revenue Service. The problem is that outside of these few (relatively) clean small sub-domains, human service data gets very murky and chaotic indeed.

A project to mesh community resource records with data on contracts, funding sources and public expenditures, for example, would immediately run into the problem that the linkage would need to be made through the concept of a program. Yet that core term is used in very different ways. Sometimes it means a discrete set of resources, practices and goals; sometimes it implies a separate organizational structure; and sometimes it seems to be a mere synonym for a funding stream. The use of that term would need to be tightened, or it would need to be replaced by some clearer concept. But even then, people trying to mesh community resource data with fiscal administrative data would find that the latter are equally unruly. A city’s contract with a nonprofit, for example, may fund one program or many; and the original source of the contract’s funding may be one or many government funding streams from the federal, state or city level. There is no uniform pattern for how these arrangements must be made, nor are there well-developed data standards.

Meshing community resource records with programmatic statistics such as outputs and outcomes would be equally fraught. While there’s a movement toward standardizing some performance measures, concrete results on the ground have been slow in coming. Even if complications from the politics around performance measurement were miraculously eliminated, there would still be the issue of murky and chaotic data that don’t easily support performance measures.

So what’s the solution?

In a nutshell: for the human service sector’s work to become a significant part of the ecosystem of open civic data downstream, the sector will have to embark on a new kind of conversation about the way data is organized upstream. This will necessarily be a longer-term conversation. It will have to involve a more diverse set of stakeholders than are usually assembled at the same table. It will have to ask (and answer) unfamiliar questions such as how can information system designers create good interrogable models of  public service work rather than merely meeting stated user requirements? It will have to take a hard look at sub-domains that have often not been modeled very well. (Funny-looking taxonomies are an important red flag for identifying those.)

Eventually, that kind of conversation can lead the sector toward far more coherent and holistic ways of organizing its data. The downstream benefits: more successful information system projects, more efficient production of performance measures, and more meaningful data for open civic uses.

—Derek Coursen

If you found this post useful, please pass it on! (And subscribe to the blog, if you haven’t already.)

Creative Commons License
This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License

6 Comments on “Free as in Puppies: Taking on the Community Resource Directory Problem”

  1. What a concise, interesting analysis of a fundamental challenge in the field (some would say *the* most fundamental). I’ve been involved with this issue for 25 years or so, since the early InfoLine and AIRS days, and I think this is just dead on.

    The need for curation of content in a crowd-sourced era is never more clear than in human services, and yet the costs are high. The Yelp model doesn’t exactly work for human services, though it has interesting implications, chiefly its claim of transparency in a “virtual community” setting. The key difference is that a bogus restaurant review on Yelp has minimal consequences – a bad meal when you expected a good one – whereas the bogus review of a child care provider can be much more serious. Someone has to validate the data, especially if it is ultimately used for purposes other than service referral, as Mr. Bloom suggests.

    I’m curious where you see NIEM fitting in with the move toward standards-based data modeling in human services. Thoughts?

    • Thanks, Edward! Yes, I think Greg has hit the nail on the head about the issue that the I&R field needs to solve.

      I agree with you, the Yelp model could be very problematic for human service providers. Perhaps something closer to a LinkedIn model might be better. I can imagine a web platform that would allow each agency to create and maintain its own program and site records. The platform could have guidance about how to enter data, and rules-based prompting to encourage people to review and update their records regularly. But it would still need staff with information science skills to curate and do quality assurance.

      I’m still working on figuring out the implications of NIEM for standards-based data modeling. I read the draft document about client and case management that Johns Hopkins put out last year, and it has some info about their conceptual data model on pp 9-10. It looks generally very solid, but most of the prose is about business processes; I wish there were more verbal explanation of the reasoning behind entity definitions and cardinality. I think verbosely spelling out one’s assumptions is a really important technique for validating the quality of a data model with all the possible stakeholders. Elsewhere I’ve tried to do that with a generic court data model and parts of human service data models.

  2. Derek, thanks for the links to those articles. Yours looks like a treasure trove and I can’t wait to read it in detail.

    I think an interesting challenge with NIEM is what I’d call implementation subsetting. To be as broadly applicable as possible, NIEM needs to be cast at a pretty high level, fairly abstracted. Specific users of the standard will always need to build out subsets (adapters, etc…different terms are used), a process which, ideally, should sit in a feedback loop with the more abstracted standard. E.g., if in building a subset for, say, substance abuse treatment referrals, an agency or software vendor discovers that the abstracted framework doesn’t quite work, there should be a way to iterate on it with some rapidity. The very nature of the multi-stakeholder NIEM process makes that all but impossible (the “rapidity” part, anyway). A software developer fully in control of both the abstract model and the implementation subset doesn’t have that challenge. Look how many years/decades it took HL7 to evolve.

    That’s my puzzle with regard to NIEM. Our own Casebook data model actually slots into NIEM fairly well (and extends a considerable way beyond it), but any adapter we build will necessarily require a consumer of our data to know plenty about our native model, making the NIEM aspect less relevant, in a way. Our current integration API is very robust and detailed; yes, we could wrap it in NIEM, and probably will, but does that reduce the work of the consuming system? Not by a lot. Building an API from an abstract data model is a tough row to hoe, and may not be particularly sustainable unless the model is far more prescriptive…which might stifle innovation and pose real challenges for already-mature systems.

    I totally support the drive toward NIEM, don’t get me wrong. It’s just the practical use of it that still seems sketchy. We’ll see.

    Thanks again for the thought provoking work.

    • Edward, thanks very much for these thoughts. I’ve done a lot of data modeling toward seeking optimal flexibility and abstraction, but I’ve never built an API and it never occurred to me that the data modeler’s approach would significantly affect how much work it is to build or maintain one. Is that because an abstract data model pushes so much of the substantive meaning into lookup table values? For the last fifteen years I have been chewing on the implications of Dave Hay’s universal data model in Data Model Patterns. It contains fewer than twenty entities and can encompass everything in the cosmos. At first I thought he was giving a humorous reductio ad absurdum of his own approach. Lately I’ve been interpreting it as a warning to be careful about punting too much semantic content out of the model and into the values!

      I suppose having a lot of generalization hierarchies might also make an API more complex to manage.

      • I think it depends on the strategy you use to build out your API from the abstracted model.

        The use of lookup tables is one tempting way – they avoid the constraints of using enumerated types in an XSD – but they can make data extraction and reporting (and ultimately data exchange between systems) more complicated. (Side note: NIEM’s extensive reliance on fixed enums worries me exactly because it imposes a prescriptive view of a domain that may not make sense in my sub-domain, or for a specific state agency, say, which legitimately categorizes things like service types in its own way.) Use of lookup tables is not a bad approach at all within the confines of a single system working in isolation, because it balances the prescriptive flavor of the abstract model with the realities of individual implementations.

        But ultimately, in my view, an API shouldn’t rely on a particular storage implementation such as a database. If an API provides a contract between systems in an integrated ecosystem, generally system A does not have access to system B’s table of lookup values, so the API has to encapsulate the shared definitions somehow, or some external mechanism has to be used to keep the lookup tables in sync. And that’s assuming that system B does not need to transform or map system A’s data as it consumes it.

        Similar complexities follow from the table-driven type extension approach, where implementers can create meta-data through a system of tables that hold dynamic definitions of extended or entirely new data types. In this scenario, a table-based approach is used not just for unary lookup values, but for the API’s extended type definitions themselves. This is actually how some “canned” application development tools do it. You define a new type through an interactive interface, and the details for the new type are stored in special database tables that can be referenced in building your application. One problem is that when you want to do analytics, or any kind of data extraction, your ETL tool has to somehow know how to navigate those definitional metadata tables to understand your data model. Nearly impossible in many cases. A greater problem, as with the lookup table approach, is that a partner system doesn’t have access to your type definitions, because they are in your database on your side of the wall. So it’s a nonstarter for APIs, really.

        The more classic approach, I think, is the class-based system. This treats the abstract model as a framework of classes from which one can derive implementation-specific subclasses. Call it the traditional C++ approach. Current XSD syntax doesn’t make this particularly easy to specify, unfortunately; support of type hierarchies is dodgy at best. The API developer is tempted to create the needed entities/attributes in whatever way makes sense for their application, then pass these as a sort of black-box “payload” within an instance of the abstract class. It can work, but it somewhat undercuts the value of the abstract class.

        In short: no perfect solution that I’m aware of. There’s no getting around the need for extending the class in ways that the original modeler never foresaw, and that’s where it can get conflictual and complex. In a way, the challenge lies in the fact that both an abstract model like NIEM **and** an API seek to be context-free, to a degree, and that is trying to square a circle!

        Sorry for the long reply. It’s something I think about a bit. 😉

  3. […] It was only nine months ago that Code for America (CfA) published its book Beyond Transparency: Open Data and the Future of Civic Innovation and in this blog I reviewed Greg Bloom’s vision of creating a data commons to share community resource direc… […]

Your Thoughts?

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s