--- title: "Analyzing Iowa Liquor Sales" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Analyzing Iowa Liquor Sales} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>", fig.height = 5, fig_width = 8, warning = FALSE ) ``` The {ialiquor} package provides a monthly summary Class E liquor sales in the state of Iowa by county or category between January 2015 and October 2020. This document is designed to demonstrate some potential use cases/analyses. ## Dataset Preview Let us quickly preview the main dataset. We will also store the dataset in a new variable called `liquor` for ease of use. ```{r setup} library(ialiquor) data("liquor_sales") liquor <- liquor_sales ``` Let's preview the sales by county: ```{r preview_liquor} dplyr::glimpse(liquor) ``` The `type` column is a derived column (not found on the original dataset) that is a higher level grouping column for more specific types (i.e., categories) of liquor. For instance, amaretto is considered to be a liqueur and bourbon/scotch is just whiskey. It is important to keep in mind that this variable (`type`) is arbitrary and defined by the package author. ## Exploratory Data Analysis Luckily for us, all of variables in each of the datasets are coded correctly. Let's focus on answering some of the key questions: - Retail Revenue by County - Revenue per Person - Profit by Type of Liquor - 2019 Profit by Type Breakdown - 2019 Top Categories - Proportion of Costs by Type ### Top Retail Revenue Counties Each Year ```{r usecase-1} library(dplyr) library(ggplot2) liquor %>% group_by(year, county) %>% summarize( total_revenue = sum(state_revenue), .groups = 'drop' ) %>% ungroup() %>% group_by(year) %>% slice(which.max(total_revenue)) ``` It's interesting to note that Polk county had the most revenue. Why is that? Let's take a look at the top county for each year in terms of population ```{r} liquor %>% group_by(year, county) %>% summarize( population = max(population), .groups = 'drop' ) %>% ungroup() %>% group_by(year) %>% slice(which.max(population)) ``` Seems as if Polk County is the largest county and it may be intuitive to assume that there are more persons buying liquor in this county. So let's look at revenue per person (based on population of the year), will we see different results? ```{r} liquor %>% mutate( rev_per_person = round(state_revenue / population , 2) ) %>% group_by(year, county) %>% summarize( rev_per_person = round(sum(rev_per_person),2), .groups = 'drop' ) %>% ungroup() %>% group_by(year) %>% slice(which.max(rev_per_person)) ``` Now this is interesting. Seems as if Dickinson county has the highest revenue per person. Keep in mind that the population values also represent individuals who are not of legal age to consume liquor (i.e., under the legal age of 21). Due to the large data size, this package does not contain the brand of liquor. However, this can easily be obtained via the Iowa Data Portal for further analysis. ### Top Types of Liquor by Profit to State Perhaps it is interesting to look at the profit to the state by type of liquor over time. ```{r, message=FALSE, warning = FALSE, fig.height=5, fig.width=8} liquor %>% filter(!type %in% c('beer','other','unknown')) %>% mutate( profit = state_revenue - state_cost ) %>% group_by(year, type) %>% summarize( total_profit = sum(profit) / 1000000, .groups = 'drop' ) %>% ggplot(aes(x = year)) + geom_line(aes(y = total_profit, col = type)) + labs( x = "Year Month", y = "Profit in Million US$ (not adjusted for Inflation)", title = "Annual Profit from Class E Liquor Sales by Liquor Type", caption = 'Source: Iowa Data Portal', color = 'Liquor Type' ) ``` From the chart above, we can see that whiskey, vodka, and rum account for quite a bit of profit for the state as compared to the other types of liquor. Anecdotally speaking, when I shop at my local liquor store, I do find many types of whiskeys. However, I'm surprised to see rum in the top three as I don't see as many rum varieties as compared to tequila or liqueur. ### Top Profit Makers for 2019 Let's focus on 2019 and see which types of liquor accounted for the vast majority. For this, we'll make a pie chart. The conclusions may be similar to before, but in this case, we can see the impact of the different types for one year a bit more clearly. ```{r, fig.height=5, fig.width=8} profit <- liquor %>% filter(!type %in% c('beer','other','unknown')) %>% mutate( profit = state_revenue - state_cost ) %>% filter(year == 2019) %>% group_by(type) %>% summarize( total_profit = round(sum(profit) / 1000000 , 0), .groups = 'drop' ) %>% ungroup() %>% mutate( pct = paste0((round(total_profit / sum(total_profit) , 2)) * 100,'%') ) %>% arrange(desc(type)) %>% mutate(lab.ypos = cumsum(total_profit) - 0.5*total_profit) profit %>% ggplot(aes(x = "", y = total_profit, fill = type)) + geom_bar(width = 1, stat = 'identity', color = 'white') + coord_polar("y", start = 0) + geom_text(aes(y = lab.ypos, label = pct), color = 'white') + theme_void() + labs( fill = 'Liquor Type', title = '2019 Profit Percentage by Liquor Type', caption = 'Source: Iowa Data Portal' ) ``` ### Proportion of Costs to State Last but not least, let’s take a look at the proportion of costs to the state. These are costs paid by the state to purchase the liquor from the appropriate vendors. ```{r, fig.height=5, fig.width=8} liquor_sales %>% filter(!type %in% c('beer','other','unknown')) %>% group_by(year_month, type) %>% summarize(total_cost = sum(state_cost), .groups = 'drop') %>% ungroup() %>% group_by(year_month) %>% mutate( cost_pct = round(total_cost / sum(total_cost) , 2) ) %>% ggplot(aes(x = year_month, y = cost_pct, fill = type)) + geom_area(alpha = 0.8, color = 'steelblue') + labs(y = 'Proportion of Cost (US$ - Not Adjusted for Inflation)', x = 'Year/Month', title = 'Proportion of State Costs of Purchasing Liquor From Vendors', caption = 'Source: Iowa Data Portal') ```