Stock Analysis

This is an AM01 Applied statistics project undertaken by my study group to returns of few financial stocks. We have extracted and plotted the data for companies per sector from the given data set. Monthly summary of returns from each stock is calculated in our analysis. Risk analysis of stocks has been done based on the data provided to understand the risk involved in the stocks under analysis.

Returns of financial stocks

# read data from dataset

nyse <- read_csv(here::here("data","nyse.csv"))

Visualisation of number of companies per sector based on the dataset

nyse_sectors <- nyse %>% 
  count(sector, sort=TRUE) %>% 
  mutate(sector = fct_reorder(sector, n))

kable(nyse_sectors, caption = "Number of Companies per Sector")

(#tab:companies_per_sector)Number of Companies per Sector
sector	n
Finance	97
Consumer Services	79
Public Utilities	60
Capital Goods	45
Health Care	45
Energy	42
Technology	40
Basic Industries	39
Consumer Non-Durables	31
Miscellaneous	12
Transportation	10
Consumer Durables	8

ggplot(nyse_sectors, aes(x = n, y = sector)) +
  geom_col(fill = "#CB454A") + 
  labs(
    title ="Number of Companies per Sector", 
    x="number of companies",
    y="sectors",
    caption = "Kostis Christodoulou (2022) Finance Data"
     )

# Notice the cache=TRUE argument inthe chunk options. Because getting data is time consuming, 
# cache=TRUE means that once it downloads data, the chunk will not run again next time you knit your Rmd

myStocks <- c("AAPL", "JPM", "DIS","DPZ","ANF","TSLA","SPY" ) %>%
  tq_get(get  = "stock.prices",
         from = "2011-01-01",
         to   = "2022-08-31") %>%
  group_by(symbol) 

glimpse(myStocks) # examine the structure of the resulting data frame

## Rows: 20,545
## Columns: 8
## Groups: symbol [7]
## $ symbol   <chr> "AAPL", "AAPL", "AAPL", "AAPL", "AAPL", "AAPL", "AAPL", "AAPL…
## $ date     <date> 2011-01-03, 2011-01-04, 2011-01-05, 2011-01-06, 2011-01-07, …
## $ open     <dbl> 11.6, 11.9, 11.8, 12.0, 11.9, 12.1, 12.3, 12.3, 12.3, 12.4, 1…
## $ high     <dbl> 11.8, 11.9, 11.9, 12.0, 12.0, 12.3, 12.3, 12.3, 12.4, 12.4, 1…
## $ low      <dbl> 11.6, 11.7, 11.8, 11.9, 11.9, 12.0, 12.1, 12.2, 12.3, 12.3, 1…
## $ close    <dbl> 11.8, 11.8, 11.9, 11.9, 12.0, 12.2, 12.2, 12.3, 12.3, 12.4, 1…
## $ volume   <dbl> 445138400, 309080800, 255519600, 300428800, 311931200, 448560…
## $ adjusted <dbl> 10.05, 10.10, 10.18, 10.18, 10.25, 10.44, 10.42, 10.50, 10.54…

#calculate daily returns
myStocks_returns_daily <- myStocks %>%
  tq_transmute(select     = adjusted, 
               mutate_fun = periodReturn, 
               period     = "daily", 
               type       = "log",
               col_rename = "daily_returns",
               cols = c(nested.col)) 

#calculate monthly  returns
myStocks_returns_monthly <- myStocks %>%
  tq_transmute(select     = adjusted, 
               mutate_fun = periodReturn, 
               period     = "monthly", 
               type       = "arithmetic",
               col_rename = "monthly_returns",
               cols = c(nested.col)) 

#calculate yearly returns
myStocks_returns_annual <- myStocks %>%
  #group_by(symbol) %>%
  tq_transmute(select     = adjusted, 
               mutate_fun = periodReturn, 
               period     = "yearly", 
               type       = "arithmetic",
               col_rename = "yearly_returns",
               cols = c(nested.col))

Summary of monthly returns for each of the stocks and `SPY`

kable(myStocks_returns_monthly %>% 
  group_by(symbol) %>% 
  summarize(min_return = min(monthly_returns), max_return = max(monthly_returns), median_return = median(monthly_returns), mean_return = mean(monthly_returns), sd_return = sd(monthly_returns)), caption = "Monthly Returns for Stocks")

(#tab:summarise_monthly_returns)Monthly Returns for Stocks
symbol	min_return	max_return	median_return	mean_return	sd_return
AAPL	-0.181	0.217	0.023	0.023	0.079
ANF	-0.421	0.507	0.001	0.003	0.146
DIS	-0.186	0.234	0.007	0.011	0.072
DPZ	-0.194	0.342	0.025	0.027	0.077
JPM	-0.229	0.202	0.020	0.012	0.073
SPY	-0.125	0.127	0.015	0.011	0.040
TSLA	-0.224	0.811	0.012	0.050	0.177

Density plot for all the stocks

ggplot(myStocks_returns_monthly, aes(x=monthly_returns)) +
  geom_density() +
  facet_wrap(~symbol) + 
  labs(
    title ="Density plot of monthly returns for each stock", 
    x="monthly returns",
    y="density",
    caption = "Kostis Christodoulou (2022) Finance Data"
     )

Inference

Analysis of risk involved in the stocks that are analysed

This density plot shows the distribution of the companies’ monthly returns. A broad curve with greater spread means that we have a large variance in outcome and thus higher risk. Therefore, it appears that Tesla is the riskiest asset in our portfolio as it has a very large spread of monthly returns and the maximum density of a possible return value is below 3. As a tech company, Tesla faces great business uncertainties (e.g. technologicla breakthroughs). SP500 ETF is the least risky one, as a “spike” is shown in the diagram, its monthly returns are very concentrated, reaching the density above 12. Meanwhile its spread is very narrow. This is because the stock is an index ETF which is made up by the selected index stocks in the market and generally reflects the risk of the entire stock market across multiple industries which diversifies the risk.

Visual depiction of expected monthly return (mean) of a stock and the risk (standard deviation) all each stock

mean_sd_return <- myStocks_returns_monthly %>% 
  group_by(symbol) %>% 
  summarize(mean_return = mean(monthly_returns), sd_return = sd(monthly_returns))

ggplot(mean_sd_return, aes(x=sd_return, y = mean_return, label=symbol))+
  geom_point(shape=23, fill="black") +
  geom_text_repel(fontface = "bold") +
  labs(
    title = "return vs risk of stocks", 
    x="standard deviation of monthly return",
    y="mean of monthly return",
    caption = "Kostis Christodoulou (2022) Finance Data"
  )

What can you infer from this plot? Are there any stocks which, while being riskier, do not have a higher expected return?

Answer: Standard deviation can be used to show the volatility of the stocks, and therefore the risks beared by the investors. Most companies’ stocks follow the rule of “risk vs return”, which means that those with higher risks should generate higher returns to compensate. For examples, S&P 500 ETF generates less risk for less return whearas Tesla generates extremely high return for high risk. Other stocks similarly follow this rule. However, Abercrombie & Fitch is risky but still provides low returns. This poor performance aligns with the bad news on the company over the recent years, including the declining sales from 2020 to 2022, scandals of discriminatory hiring practices, store closure etc.