This Julia script builds a custom home price index using New York City property sales data from 2003 through recent months. It differs from Zillow’s price indices in both methodology and flexibility, allowing for tailored insights at the borough, neighborhood, and property-type level.
Table of Contents
What the Code Does
- Data Ingestion & Cleaning:It reads in historical sales data across boroughs, standardizes column names, categorizes property types (e.g., Coop, Condo, Single-Family Home), and filters implausible sale prices and duplicates.
- UID Generation:Unique identifiers for properties are created using block-lot-apartment logic to track repeat sales over time.
- Outlier Detection:The model flags and removes outlier transactions based on excessive price changes adjusted for time between sales.
- Regression-Based Index Calculation:Using a repeat-sales regression weighted by time-between-sales residuals, the code estimates period-over-period price changes. The result is a time series of normalized price indices.
- Segmented Indexing:It calculates indices not just overall, but also segmented by price tier (top/bottom deciles/thirds), borough, property type, and neighborhood.
How It’s Different from Zillow’s Price Index
Feature | This Code | Zillow Home Value Index (ZHVI) |
---|---|---|
Data Source | NYC public sales transaction data | Proprietary data, including listings |
Methodology | Repeat-sales regression (like Case-Shiller) | AVM (automated valuation model) |
Outlier Filtering | Custom logic using log price change | Internal filters (opaque) |
Segmentation | Fine-grained (borough, neighborhood, price tiers) | Limited segmentation via dashboards |
Transparency | Fully auditable and modifiable | Black-box methodology |
Why It’s More Customizable
- Open & Extensible: Easily adjust filters (e.g., sale price ranges, property types), time resolution, or regression methods.
- Fine Control Over Segmentation: Custom groupings like “top third” or specific neighborhoods enable hyper-local insights.
- Tailored Outlier Detection: Change thresholds for what counts as an anomalous price movement.
- Reproducibility: Every transformation is explicit and reproducible, unlike opaque third-party indices.
Use Cases
- Policy analysis by borough or income level
- Neighborhood-specific investment tracking
- Academic housing studies
- Custom reporting for real estate professionals
Conclusion
This code provides a transparent, flexible, and research-grade alternative to Zillow’s ZHVI, ideal for users needing precise control over methodology and data scope.