Today we will make the plots that are present in this webpage that tracks the situation in New York City - New York City Coronavirus Map and Case Count

There are two main plots -

  1. Cases per capita
  2. Plot analyzing the relationship b/w average household size, income and cases per capita

They have since corrected the graph -

Old

old chart

New

new chart

#hide_output
import pandas as pd
import altair as alt
import geopandas as gpd
alt.renderers.set_embed_options(actions=False)

Fortunately NYC Department of Health and Mental Hygiene publishes their data in their GitHub Repo. It has all the data from cases to the shapefiles too.

But we will use the data from NYC Open Data which is equally good. We will use the NYC Department of Health and Mental Hygiene GitHub repo only for their COVID data per MODZCTA.

The location of the dataset on NYC Open Data portal is https://data.cityofnewyork.us/Health/Modified-Zip-Code-Tabulation-Areas-MODZCTA-/pri4-ifjk which we will export as geojson.

import requests
resp = requests.get('https://data.cityofnewyork.us/api/geospatial/pri4-ifjk?method=export&format=GeoJSON')
data = resp.json()
nyc_zip = gpd.GeoDataFrame.from_features(data['features'])
nyc_zip.head()
geometry label modzcta pop_est zcta
0 MULTIPOLYGON (((-73.98774 40.74407, -73.98819 ... 10001, 10118 10001 23072 10001, 10119, 10199
1 MULTIPOLYGON (((-73.99750 40.71407, -73.99709 ... 10002 10002 74993 10002
2 MULTIPOLYGON (((-73.98864 40.72293, -73.98876 ... 10003 10003 54682 10003
3 MULTIPOLYGON (((-74.00827 40.70772, -74.00937 ... 10004 10004 3028 10004
4 MULTIPOLYGON (((-74.00783 40.70309, -74.00786 ... 10005 10005 8831 10005, 10271
# If you have the data locally
#modzcta = 'MODZCTA/Modified Zip Code Tabulation Areas.geojson'
#nyc_zip_shp = gpd.read_file(modzcta)

This is how it looks when colored based on population -

alt.Chart(nyc_zip).mark_geoshape().encode(
    color='pop_est:Q'
)

Now we will get the Median Household Income data from Census Reporter's wonderful website. In particular the url query for the 5 boroughs in NYC is https://censusreporter.org/data/table/?table=B19013&geo_ids=860%7C05000US36061,860%7C05000US36047,860%7C05000US36081,860%7C05000US36005,860%7C05000US36085 which I came to know thanks to this comment in NYC Health covid data repo.

Download it and export it to work with it further (I chose the Shapefile, but I suppose others would be fine too).

median_income_path = 'MODZCTA/median_income_nyc/acs2018_5yr_B19013_86000US11417/acs2018_5yr_B19013_86000US11417.shp' #push it to datasets repo and link here
median_income = gpd.read_file(median_income_path)
median_income.head()
geoid name B19013001 B19013001e geometry
0 86000US10001 10001 88526.0 8060.0 POLYGON ((-74.00828 40.75027, -74.00783 40.751...
1 86000US10002 10002 35859.0 2149.0 POLYGON ((-73.99750 40.71407, -73.99709 40.714...
2 86000US10003 10003 112131.0 13190.0 POLYGON ((-73.99937 40.73132, -73.99911 40.731...
3 86000US10004 10004 157645.0 17195.0 MULTIPOLYGON (((-73.99814 40.70152, -73.99617 ...
4 86000US10005 10005 173333.0 33390.0 POLYGON ((-74.01251 40.70677, -74.01195 40.707...

The regions in median income data that are not part of NTC's modzcta can be seen as follows(basically we are finding out if both datasets show the same regions) -

median_income[median_income['name'].isin(nyc_zip['modzcta']) == False].plot()
<AxesSubplot:>

Fun Fact - the areas marked above are also NOT SHOWN in NYT's Charts

If you want you can plot the Median Income Data too using -

alt.Chart(median_income).mark_geoshape().encode(
    color='B19013001:Q'
)

Now we will get the COVID data per MODZCTA

covid_zip_uri = 'https://raw.githubusercontent.com/nychealth/coronavirus-data/master/data-by-modzcta.csv'
covid_zip = pd.read_csv(covid_zip_uri)
covid_zip.head()
MODIFIED_ZCTA NEIGHBORHOOD_NAME BOROUGH_GROUP COVID_CASE_COUNT COVID_CASE_RATE POP_DENOMINATOR COVID_DEATH_COUNT COVID_DEATH_RATE PERCENT_POSITIVE TOTAL_COVID_TESTS
0 10001 Chelsea/NoMad/West Chelsea Manhattan 440 1867.33 23563.03 26 110.34 7.18 6124
1 10002 Chinatown/Lower East Side Manhattan 1318 1717.14 76755.41 161 209.76 8.33 15831
2 10003 East Village/Gramercy/Greenwich Village Manhattan 534 992.54 53801.62 35 65.05 3.91 13671
3 10004 Financial District Manhattan 40 1095.71 3650.61 1 27.39 5.33 751
4 10005 Financial District Manhattan 94 1119.57 8396.11 2 23.82 4.91 1914

Merge the covid data with NYC shapefile data to be able to see the number of cases per modzcta

def modify_merge(nyc_zip, covid_zip):
    covid_zip = covid_zip.loc[:, ['MODIFIED_ZCTA', 'BOROUGH_GROUP', 'COVID_CASE_COUNT', 'COVID_DEATH_COUNT', 'NEIGHBORHOOD_NAME']]
    nyc_zip['pop_est'] = nyc_zip['pop_est'].astype(int)
    nyc_zip['modzcta'] = nyc_zip['modzcta'].astype(int)
    covid_zip.rename(columns = {'MODIFIED_ZCTA': 'modzcta'}, inplace=True)
    covid_nyc_zip = nyc_zip.merge(covid_zip, how='left', on='modzcta')
    return covid_nyc_zip
covid_nyc_zip = modify_merge(nyc_zip, covid_zip)
covid_nyc_zip.head()
geometry label modzcta pop_est zcta BOROUGH_GROUP COVID_CASE_COUNT COVID_DEATH_COUNT NEIGHBORHOOD_NAME
0 MULTIPOLYGON (((-73.98774 40.74407, -73.98819 ... 10001, 10118 10001 23072 10001, 10119, 10199 Manhattan 440.0 26.0 Chelsea/NoMad/West Chelsea
1 MULTIPOLYGON (((-73.99750 40.71407, -73.99709 ... 10002 10002 74993 10002 Manhattan 1318.0 161.0 Chinatown/Lower East Side
2 MULTIPOLYGON (((-73.98864 40.72293, -73.98876 ... 10003 10003 54682 10003 Manhattan 534.0 35.0 East Village/Gramercy/Greenwich Village
3 MULTIPOLYGON (((-74.00827 40.70772, -74.00937 ... 10004 10004 3028 10004 Manhattan 40.0 1.0 Financial District
4 MULTIPOLYGON (((-74.00783 40.70309, -74.00786 ... 10005 10005 8831 10005, 10271 Manhattan 94.0 2.0 Financial District

Customary covid cases visualization per modfied zip code tabulation area -

alt.Chart(covid_nyc_zip).mark_geoshape().encode(
    color='COVID_CASE_COUNT'
)