--- title: "Analyzing Hass USA" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Analyzing Hass USA} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` The {avocado} package provides a weekly summary - starting from January 2017 through November 2020 - of Hass Avocado sales. There are three datasets in this package and let's start with the dataset `hass_usa` which focuses on weekly avocado sales in the contiguous US. Let's start by loading the package - along with {dplyr} (for data wrangling) and {ggplot} (for data visualization) - and exploring it's structure ```{r setup} library(avocado) library(dplyr) library(ggplot2) data('hass_usa') dplyr::glimpse(hass_usa) ``` ## Exploratory Data Analysis Let's begin by exploring the following two topics: - Fluctuation of average selling price - Non Organic Avocado sales revenue ### Fluctuation of Average Selling Price ```{r fig.height=5, fig.width=8} hass_usa %>% ggplot(aes(x = week_ending)) + geom_line(aes(y = avg_price_nonorg, color = 'Non Organic')) + geom_line(aes(y = avg_price_org, color = 'Organic')) + scale_color_manual(values = c('steelblue','forestgreen')) + labs(x = 'Year', y = 'Average Price per Pound in US$', color = 'Type', title = 'Average Selling Price by Week', caption = 'Not adjusted for inflation\nSource: Hass Avocado Board') + ylim(min = 0, max = 3.0) + theme(plot.background = element_rect(fill = "grey20"), plot.title = element_text(color = "#FFFFFF"), axis.title = element_text(color = "#FFFFFF"), axis.text.x = element_text(color = 'grey50', angle = 45, hjust = 1), axis.text.y = element_text(color = 'grey50'), plot.caption = element_text(color = 'grey75'), panel.background = element_blank(), panel.grid.major = element_line(color = "grey50", size = 0.2), panel.grid.minor = element_line(color = "grey50", size = 0.2), legend.background = element_rect(fill = 'grey20'), legend.key = element_rect(fill = 'grey20'), legend.title = element_text(color = 'grey75'), legend.text = element_text(color = 'grey75'), legend.position = c(0.815, 0.85) ) ``` Unsurprisingly, we can see that the average selling price for organic avocados tends to be higher than the average selling price for non-organic avocados. Note how there seems to be a fairly large spike in selling price in late 2017. ### Non Organic Avocado Sales Revenue ```{r fig.height=5, fig.width=8} hass_usa %>% mutate( nonorg_rev = (plu4046+plu4225+plu4770+small_nonorg_bag+large_nonorg_bag+xlarge_nonorg_bag) * avg_price_nonorg ) %>% ggplot(aes(x = week_ending)) + geom_line(aes(y = nonorg_rev/1000000), color = 'steelblue') + labs(x = 'Year', y = 'Revenue (Millions US$)', color = 'Type', title = 'Non Organic Avocado Sales Revenue', caption = 'Not adjusted for inflation\nSource: Hass Avocado Board') + theme(plot.background = element_rect(fill = "grey20"), plot.title = element_text(color = "#FFFFFF"), axis.title = element_text(color = "#FFFFFF"), axis.text.x = element_text(color = 'grey50'), axis.text.y = element_text(color = 'grey50'), plot.caption = element_text(color = 'grey75'), panel.background = element_blank(), panel.grid.major = element_line(color = "grey50", size = 0.2), panel.grid.minor = element_line(color = "grey50", size = 0.2), legend.background = element_rect(fill = 'grey20'), legend.key = element_rect(fill = 'grey20'), legend.title = element_text(color = 'grey75'), legend.text = element_text(color = 'grey75'), # legend.position = c(0.815, 0.2) legend.position = 'none' ) ``` We may have switched gears from looking at average selling price to revenue. However, notice the massive drop in revenue in late 2017 - which seems to correspond to the increase in average selling price for the same time period. Let's dive further into that 'dip'. ```{r fig.height=5, fig.width=8} hass_usa %>% mutate( nonorg_rev = (plu4046+plu4225+plu4770+small_nonorg_bag+large_nonorg_bag+xlarge_nonorg_bag) * avg_price_nonorg ) %>% select(week_ending, nonorg_rev) %>% filter(week_ending > lubridate::ymd('2017-09-10') & week_ending < lubridate::ymd('2017-10-15')) %>% ggplot(aes(x = week_ending, y = nonorg_rev)) + geom_line(color = 'steelblue') + labs(x = 'Year', y = 'Revenue', color = 'Type', title = 'Non Organic Avocado Sales Revenue', caption = 'Not adjusted for inflation\nSource: Hass Avocado Board') + theme(plot.background = element_rect(fill = "grey20"), plot.title = element_text(color = "#FFFFFF"), axis.title = element_text(color = "#FFFFFF"), axis.text.x = element_text(color = 'grey50'), axis.text.y = element_text(color = 'grey50'), plot.caption = element_text(color = 'grey75'), panel.background = element_blank(), panel.grid.major = element_line(color = "grey50", size = 0.2), panel.grid.minor = element_line(color = "grey50", size = 0.2), legend.background = element_rect(fill = 'grey20'), legend.key = element_rect(fill = 'grey20'), legend.title = element_text(color = 'grey75'), legend.text = element_text(color = 'grey75'), legend.position = 'none' ) ``` So we can see that the revenue wasn't exactly 0. However, it definitely was much lower (way less than a US$1 million). A quick search on the Internet shows that around that time frame, there was a major avocado supply [shortage](https://laist.com/2017/08/17/avocado_shortage.php) in the US.