top of page

DATA SOURCES

 

  • The original data contained information of about 12,000 houses by zipcode

  • After cleaning data, the final data, used in the analysis, contains 8,732 rows with 34 variables

    • Zillow: 8 variables

    • NYU Furman Center: 23 variables

    • Distance& Duration: 3 variables

DATA PREVIEW
 
ZILLOW DATA
 
  • Extracted the data from Zillow through web scrapping method

  • Variables: 

    • Zip Code, The Number of Beds, The Number of Baths, Size of a Home, Type of a House, Year a Home Built, The Number of Days a Home Has Been Listed for Sale on Zillow

      ​

EXTERNAL DATA

 

  • The data is available by community district, consisting of few zip codes

  • Variables:

    • Racial Diversity Index, Poverty Rate, Labor Participation Rate, Unemployment Rate, Proportion of People with Bachelor Degree, Proportion of People without High School Education, Proportion of Single Household, Proportion of People born in NYC, Proportion of Different Ethnicities, Proportion of Disabled People, Income Diversity, Index of Housing Price Appreciation, Serious Housing Code Violations, Home Ownership Rate, Population Density, Serious Crime Rate, Car-Free Commute Rate, Mean Travel Time to Work, Median Rent Asked by Landlords, Housing Choice Voucher Rate, and others.

DISTANCE DATA
​
  • Through Text Mining, we noticed some words are frequently found, including Central Park, Shopping, and Wall Street

  • Used latitude and longitude of every zip codes to find the durations and distances between each zip code to the landmarks such as Central Park, SOHO, and Financial District ​

bottom of page