Divisions can be seen as the smallest building block in the Open Civic Data ecosystem, Jurisdictions and Organizations will exist within a Division and People are elected to represent a Division. As such, providing unique identifiers enables collaboration across groups dealing with any of these types.
This proposal in fact predates the formal proposal process by a full year, originally part of the ocd-division-ids repository, the ids are already in use by Sunlight, Google, Granicus, Open North, Open Elections, and several other projects. (This document simply exists to formalize what was already decided.)
Identifiers are in the format ocd-division/country:<country_code>(/<type>:<type_id>)*
- An ISO-3166-1 alpha-2 code.
The type of boundary. (e.g. country, state, town, city, cd, sldl, sldu)
- Valid characters are lowercase UTF-8 letters, hyphen (-), and underscore (_).
- Use existing types where possible.
An identifier that is locally unique to its scope.
- Valid characters are lowercase UTF-8 letters, numerals (0-9), period (.), hyphen (-), underscore (_), and tilde (~). These characters match the unreserved characters in a URI RFC 3986 section 2.3.
- Characters must be converted to UTF-8.
- Uppercase characters must be converted to lowercase.
- Spaces must be converted to underscores.
- All invalid characters must be converted to tildes (~).
- Leading zeros should be removed unless doing so changes the meaning of the identifier.
- If possible, all divisions of the same type should be defined at the same time; for example, all state divisions should be defined at once. Similarly, all cities in North Carolina should be defined at once, to avoid adopting a scheme that produces collisions.
- When selecting a type_id, preference should be given to existing, common identifiers, like postal abbreviations for US states. Numeric identifiers (such as US county FIPS codes) should be avoided if textual names are clear and unambiguous; however, numeric identifiers may be appended to disambiguate a type_id.
- The set of types within each country should not grow unnecessarily. Each country maintainer should publish a list of types for easy reference. The addition of a new type must be justified.
- For example: In the US, there are no clear-cut differences between cities, towns, villages, etc. Therefore, the Census-recommended term place is used as the type of cities, etc.
The identifiers directory contains CSV files assigning all OCD identifiers:
- A single CSV file per country, in the format country-<country_code>.csv.
- The URLs of these files are stable.
- An optional directory per country, in the format country-<country_code>:
- A file hierarchy, in which CSV files describe parts of the top-level country CSV file.
- The URLs of these files are not stable.
The corrections directory contains CSV files that map incorrect OCD identifiers to correct OCD identifiers. Common errors include missing diacritics, differences in hyphenation and word order, use of Roman numerals, etc.
If a CSV file has no header row, the CSV is assumed to have two columns with the headers id and name.
If a CSV file has a header row, the first column name must be id.
Column names with special meaning are:
- The name of the division.
- An OCD identifier which identifies the same division as this identifier. The row corresponding to the identifier in this column must have a blank value in its sameAs column, i.e. there must be no daisy-chaining or circular references.
- A note describing how or why the division has multiple identifiers.
- The date on which the division is no longer valid, in the format YYYY, YYYY-MM or YYYY-MM-DD. A division may become invalid if, for example, a political district is abolished.
- The date on which a division becomes valid, in the format YYYY, YYYY-MM or YYYY-MM-DD. A division may become valid if, for example, a political district is created.
- There are no restrictions on other columns.
- An effort should be made to use descriptive CSV filenames.
A correction CSV file must contain:
- An incorrect OCD identifier, i.e. an OCD identifier that was never valid.
- The corrected OCD identifier.
- Free-text describing the error, e.g. “missing diacritics”.
- All OCD identifiers are first-class. However, if it is necessary for a system for choose a “primary” or “preferred” identifier for a division, it should use those identifiers with an empty sameAs column.
- The sameAs relationship is symmetric and transitive. The sameAs relationship is not true for all time; it is only true in the present.
This project has an informal governance structure, led by the project’s early contributors and informed by the Open Civic Data Google Group. Responsibility for a country’s identifiers may be assigned to country-specific organizations.
- United States
- North Carolina
- North Carolina 2nd Congressional District
- North Carolina State Lower Legislative District 1
- Wake County, North Carolina
- Cary, North Carolina (note that despite being within Wake County this is not indicated due to not being an identifying feature)
- Kildaire Farms Homeowners Association, Cary, North Carolina
- Washington DC, Ward 8
- Washington DC, ANC 4A
- Washington DC, ANC 4A, section 08 (note: this is a strict subset of the ANC for purposes of representation)
- New York City, City Council District 36 (happens to be in Brooklyn- but not significant to include in id)
- Canadian Federal Electoral District 13004 aka Fundy Royal (known as Royal from 1914-1966, Fundy-Royal from 1966-2003, and Fundy from 2003-2004- hence the use of a numeric identifier assigned by the government)