San Fransisco Rent Analysis

This is an AM01 Applied statitics project undertaken by my study group to analyse the rent trends in San Francisco and Bay area in the United States of America. We have analysed the raw data to understand the variable types and missing data points. We have extracted the top 20 cities in terms of classifieds % between 2000 and 2018.The project also includes the visual depiction of evolution of median prices in San Francisco for 0, 1, 2, and 3 bedrooms listings in addition to similar analysis for top 12 cities in the Bay area.

Rents in San Francisco 2000-2018

# download directly off tidytuesdaygithub repo

rent <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2022/2022-07-05/rent.csv')

What are the variable types? Do they all correspond to what they really are? Which variables have most missing values?

skim(rent)
(#tab:skim_data)Data summary
Name rent
Number of rows 200796
Number of columns 17
_______________________
Column type frequency:
character 8
numeric 9
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
post_id 0 1.00 9 14 0 200796 0
nhood 0 1.00 4 43 0 167 0
city 0 1.00 5 19 0 104 0
county 1394 0.99 4 13 0 10 0
address 196888 0.02 1 38 0 2869 0
title 2517 0.99 2 298 0 184961 0
descr 197542 0.02 13 16975 0 3025 0
details 192780 0.04 4 595 0 7667 0

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100 hist
date 0 1.00 20095718.38 44694.07 20000902.0 20050227.0 20110924.0 20120805.0 20180717.0 ▁▇▁▆▃
year 0 1.00 2009.51 4.48 2000.0 2005.0 2011.0 2012.0 2018.0 ▁▇▁▆▃
price 0 1.00 2135.36 1427.75 220.0 1295.0 1800.0 2505.0 40000.0 ▇▁▁▁▁
beds 6608 0.97 1.89 1.08 0.0 1.0 2.0 3.0 12.0 ▇▂▁▁▁
baths 158121 0.21 1.68 0.69 1.0 1.0 2.0 2.0 8.0 ▇▁▁▁▁
sqft 136117 0.32 1201.83 5000.22 80.0 750.0 1000.0 1360.0 900000.0 ▇▁▁▁▁
room_in_apt 0 1.00 0.00 0.04 0.0 0.0 0.0 0.0 1.0 ▇▁▁▁▁
lat 193145 0.04 37.67 0.35 33.6 37.4 37.8 37.8 40.4 ▁▁▅▇▁
lon 196484 0.02 -122.21 0.78 -123.2 -122.4 -122.3 -122.0 -74.2 ▇▁▁▁▁
glimpse(rent)
## Rows: 200,796
## Columns: 17
## $ post_id     <chr> "pre2013_134138", "pre2013_135669", "pre2013_127127", "pre…
## $ date        <dbl> 20050111, 20050126, 20041017, 20120601, 20041021, 20060411…
## $ year        <dbl> 2005, 2005, 2004, 2012, 2004, 2006, 2007, 2017, 2009, 2006…
## $ nhood       <chr> "alameda", "alameda", "alameda", "alameda", "alameda", "al…
## $ city        <chr> "alameda", "alameda", "alameda", "alameda", "alameda", "al…
## $ county      <chr> "alameda", "alameda", "alameda", "alameda", "alameda", "al…
## $ price       <dbl> 1250, 1295, 1100, 1425, 890, 825, 1500, 2925, 450, 1395, 1…
## $ beds        <dbl> 2, 2, 2, 1, 1, 1, 1, 3, NA, 2, 2, 5, 4, 0, 4, 1, 3, 3, 1, …
## $ baths       <dbl> 2, NA, NA, NA, NA, NA, 1, NA, 1, NA, NA, NA, 3, NA, NA, NA…
## $ sqft        <dbl> NA, NA, NA, 735, NA, NA, NA, NA, NA, NA, NA, 2581, 1756, N…
## $ room_in_apt <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0…
## $ address     <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ lat         <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 37.5, NA, …
## $ lon         <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ title       <chr> "$1250 / 2br - 2BR/2BA   1145 ALAMEDA DE LAS PULGAS", "$12…
## $ descr       <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
## $ details     <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, "<p class=…

Answer:

  • Variable types: chr & dbl
  • Do they correspond: Date could be formatted as < date >
  • Most missing variable: descr (197542)

Plot to show the top 20 cities in terms of % of classifieds between 2000-2018. (Includes calculation of number of listings by city, and then converted to %)

top20 <- rent %>% 
  count(city, sort=TRUE) %>% 
  mutate(proportion = n/sum(n)) %>% 
  slice_max(order_by = proportion, n=20) %>% 
  mutate(city = fct_reorder(city, proportion))
  
ggplot(data = top20, mapping = aes(x=proportion, y=city)) +
  geom_col() +
  labs(
    title = "San Francisco accounts for more than a quarter of all rental classifieds",
    subtitle = "% of Craigslist listings, 2000-2018",
    x = NULL,
    y = NULL,
    caption="Source: Pennigton, Kate (2018). Bay Area Craiglist Rental Housing Posts, 2000-2018"
  ) +
  scale_x_continuous(labels = scales::percent) +
  theme_light() +
  theme(panel.border = element_blank())+
  theme(plot.title = element_text(hjust = -0.35))+
  theme(plot.subtitle = element_text(hjust = -0.15))

Visual depiction of evolution of median prices in San Francisco for 0, 1, 2, and 3 bedrooms listings.

median_per_bed <- rent %>% 
  filter(beds <= 3, city == "san francisco") %>% 
  group_by(beds, year) %>% 
  summarize(median_price = median(price))

ggplot(median_per_bed, aes(x=year, y=median_price, color = factor(beds))) +
  geom_line() +
  facet_wrap(~beds, nrow = 1) +
  labs(
    title = "San Francisco rents have steadily been increasing",
    subtitle = "0 to 3-bed listings, 2000-2018",
    x=NULL,
    y=NULL,
    caption = "Source: Pennigton, Kate (2018). Bay Area Craiglist Rental Housing Posts, 2000-2018"
  ) +
  xlim(2003,2018) +
  theme_light() +
  theme(legend.position="none") +
  theme(plot.title = element_text(hjust = 0))+
  theme(plot.subtitle = element_text(hjust = 0)) +
  theme(strip.text.x = element_text(colour = "black")) +
  theme(panel.border = element_rect(color = "black", fill = NA, size = 0.5)) +
  theme(strip.background = element_rect(color = "black", size = 0.5))

Visualization of median rental prices for the top 12 cities in the Bay area.

cities_to_include <- rent %>% 
  group_by(city) %>% 
  summarize(number_ads = n()) %>%
  slice_max(order_by = number_ads, n=12)

cities_to_include <- cities_to_include$city

median_per_bed <- rent %>% 
  filter(beds == 1, city %in% cities_to_include) %>% 
  group_by(city, year) %>% 
  summarize(median_price = median(price)) 

ggplot(median_per_bed, aes(x=year, y=median_price, color = factor(city))) +
  geom_line() +
  facet_wrap(~city, nrow = 3) +
  labs(
    title = "Rental prices for 1-bedroom flats in the Bay Area",
    x=NULL,
    y=NULL,
    caption="Source: Pennigton, Kate (2018). Bay Area Craiglist Rental Housing Posts, 2000-2018"
  ) +
  theme_light() +
  theme(legend.position="none") +
  theme(plot.title = element_text(hjust = 0)) +
  theme(strip.text.x = element_text(colour = "black")) +
  theme(panel.border = element_rect(color = "black", fill = NA, size = 0.5)) +
  theme(strip.background = element_rect(color = "black", size = 0.5))

Inference

Looking at the graphs from the last exercise we can see that rental prices have increased since the year 2000. While one major cause could be inflation, it is also true that large tech companies have increased the attractiveness of the Bay Area making living there more expensive.
The figures also reveal that there has been a decline in the growth rate (or even negative growth) in the median rent after 2015. It could be due to the fact that more and more people are leaving California as the boom seems to lose momentum. A CNBC video outlines the reasons for this exodus of people.