NYC Datascape: Help

Methodology

All distance, area, containment, intersection, and bounding box calculations are PostGIS queries via Cartodb API (which is super freaking awesome and without which this project would have been impossible in the allotted time. Very special thanks to Andrew @ Vizzuality for personal help).

The mesh is based on 2000 census tract shape files:

http://nycopendata.socrata.com/Government/2000-Census-Blocks-Tracts-GIS-2000-Census-Tracts/zj6d-mjsz

Several of the calculations "fit" 2010 data into the 2000 tracts using the 2010 census tract relationship files:

http://www.census.gov/geo/www/2010census/tract_rel/tract_rel.html

We considered using the even finer "block" level data but tract turned out to be the best compromise between performance and fineness of mesh. Tracts have the additional advantage of being an atomic query concept for the StreetEasy API (discussed later).

The neighborhood names are hand-adapted from this census information, which is rather subjective. This page was very helpful:

http://www.nyc.gov/html/dcp/pdf/neighbor/neighbor.pdf

Some tracts are not fine enough to map well to neighborhoods. For example, neighborhoods of DUMBO and Vinegar Hill are both contained in the same tract, which I (arbitrarily) named "DUMBO."

We used the excellent NHGIS system to browse and retrieve census data.

https://www.nhgis.org/

The census tract polygons are rendered using the Google Earth API.

http://code.google.com/apis/earth/

Residentiality

We use the commercial and industrial zoning data from nycopendata. The tracts are "bluer" the higher the ratio of (area not zoned commercial or industrial / total area).

http://nycopendata.socrata.com/Business-and-Economic/Primary-Commercial-Zoning-by-lot/pwhj-ikym

http://nycopendata.socrata.com/Business-and-Economic/Primary-Manufacturing-Zoning-by-lot/kxg8-856s

Price

We used the StreetEasy API (thanks, Sebastian!!!) to determine avg price per square foot for each tract where available. Unfortunately, there were too many tracts without listings to create a smooth map, so we used the StreetEasy API to determine the StreetEasy "area" of the tract's centroid and retrieved the average price per square foot for that area. The latter approach loses a lot of detail, but creates a smoother map. The slider decreases the value of tracts whose ppsf is above the threshold set.

http://streeteasy.com/nyc/api/info

Commute

Using published data from MTA (available from the nycopendata site), we constructed a giant table of estimated commute times between any two points on the transportation network. The location drop-down selects a row from this table, and the "How long a commute?" slider sets a threshold for desired maximum commute time to that point. Finally, the "commute" slider sets the penalty for going over the threshold.

Safety

NYC publishes monthly statistics for categories of crime in each precinct which we access through the nycopendata portal. We assigned weightings to each crime (murder and rape times 10, assault and robbery times 2, burglary times 1) and adjusted by precinct population using 2010 census data to arrive at a "weighted crime per capita" metric for each precinct. Then each census tract was assigned the score of the precinct that contains its centroid.

This page was very informative:

http://johnkeefe.net/nyc-police-precinct-and-census-data

As well as this paper:

http://wagner.nyu.edu/news/impactzoning.doc

Tracts with a high score are colored red (overriding the "residentiality" coloring) and are penalized by turning up the safety slider.

http://nycopendata.socrata.com/Public-Safety/NYPD-Public-Indicators/yts9-kmw9

Cafes

This metric uses data on licensed sidewalk cafes available through the nycopendata portal. Each tract is scored proportionally to the log of the number of sidewalk cafes near the tract.

http://nycopendata.socrata.com/Business-and-Economic/Sidewalk-Cafes/6k68-kc8u

Parks

Using NYC park data accessed through the nycopendata portal, each tract is scored according to the ratio of (park area in radius) / (total area in radius).

http://nycopendata.socrata.com/Facilities-and-Structures/Map-of-Parks/jc79-4imn

Youth

We get the median age from the 2010 census data. The slider increases the value of tracts whose population's median age is under 40. Profresional demographers tell me this usually considered a weak measure, and in a future version, this may be replaced by a calculation of (population between 25 and 44) / (total population).

Minnesota Population Center. National Historical Geographic Information System: Version 2.0. Minneapolis, MN: University of Minnesota 2011.

The College of William and Mary and the Minnesota Population Center. School Attendance Boundary Information System (SABINS): Version 1.0. Minneapolis, MN: University of Minnesota 2011.

Space

2000/2010 census data. This slider increases the value of a "population dispersion" variable (the opposite of population density), which is proportional to the tract area (reduced for non-residential zones) over the tract population.

Minnesota Population Center. National Historical Geographic Information System: Version 2.0. Minneapolis, MN: University of Minnesota 2011.

The College of William and Mary and the Minnesota Population Center. School Attendance Boundary Information System (SABINS): Version 1.0. Minneapolis, MN: University of Minnesota 2011.

Schools

We used the "School Progress Reports" and "School Zone Shape Files" available through the nycopendata portal, which provides parents' "grades" (A-F) of each public school for the last 4 years. We weighted more recent years' results higher than older results to score each school. Then each tract was assigned the score of the school in whose zone the (tract's) centroid lies.

http://nycopendata.socrata.com/Education/School-Progress-Reports-All-Schools-2011-Multiyear/rwa3-b3wr

http://nycopendata.socrata.com/Education/School-Zones-2011-2012/dqkt-8x6u