We will answer this question using R. First, load the dataset. It contains state names, census regions, census divisions, 2013 populations and the number of registered gun owners (also in 2013). This data comes to us from CBS News.
# Set working directory
setwd("~/Desktop")
# Load dataset
library(readxl)
Firearms_Data_KAD <- read_excel("~/Desktop/R/Per Capita Firearms Registered/Firearms_Data_KAD.xlsx")
head(Firearms_Data_KAD)
## # A tibble: 6 x 6
## StateName StateLower CensusRegion Division `2013Pop` `2013Registered…
## <chr> <chr> <chr> <chr> <dbl> <dbl>
## 1 Wyoming wyoming West Mountain 582658 114052
## 2 District … district o… South South At… 646449 42897
## 3 Arkansas arkansas South West Sou… 2959373 123130
## 4 New Mexico new mexico West Mountain 2085287 84471
## 5 Virginia virginia South South At… 8260405 248939
## 6 Idaho idaho West Mountain 1612136 39019
Next, let’s load all of the packages we’ll use for data transformations and visualizations.
# Load needed packages
library(ggplot2)
library(RColorBrewer)
library(lattice)
library(ggthemes)
library(maps)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(sqldf)
## Loading required package: gsubfn
## Loading required package: proto
## Loading required package: RSQLite
library(choroplethr)
## Loading required package: acs
## Loading required package: stringr
## Loading required package: XML
##
## Attaching package: 'acs'
## The following object is masked from 'package:dplyr':
##
## combine
## The following object is masked from 'package:base':
##
## apply
Let’s do some basic analysis and create a preliminary boxplot.
# Add percentage value
Firearms_Data_KAD$Pct <- Firearms_Data_KAD$`2013RegisteredOwners`/Firearms_Data_KAD$`2013Pop`
# Add value for ordering bar charts
Firearms_Data_KAD$Order <- 1-Firearms_Data_KAD$Pct
# Box and whisker plot using Lattice
boxplot <- bwplot(Pct ~ Division, data=Firearms_Data_KAD,
xlab="US Census Regional Divisions",
ylab="PerCapita Registered Gun Owners",
main="How Heavily Armed is Your Division? (2013)")
boxplot
Next, let’s use ggplot2 and a Wall Street Journal Theme to make this easier on the eyes.
# Box and whisker plot using ggplot2
boxplot2 <- ggplot(Firearms_Data_KAD, aes(Division, Pct)) +
geom_boxplot() + ylab("PerCapita Registered Gun Owners") +
xlab("US Census Regional Divisions") +
labs(title="Division Owners")
boxplot2 + coord_flip() + theme_wsj()
Now let’s step back a level of detail and see if there are similarities in Census Regions.
boxplot3 <- ggplot(Firearms_Data_KAD, aes(CensusRegion, Pct)) +
geom_boxplot() + ylab("PerCapita Registered Gun Owners") +
xlab("US Census Regions") +
labs(title="Regions")
boxplot3 + coord_flip() + theme_wsj()
Next, let’s dive into state-level detail to see the outliers.
# Bar plot by state
# Order states
# states <- c("Wyoming", "District of Columbia", "Arkansas", "New Mexico", "Virginia", "Idaho", "Alabama", "Nevada", "Alaska", "Louisiana", "Pennsylvania", "Maryland", "New Hampshire", "Georgia", "Indiana", "Kentucky", "Utah", "Texas", "Oklahoma", "Colorado", "South Carolina", "South Dakota", "Ohio", "Oregon", "Connecticut", "Montana", "Tennessee", "North Carolina", "Kansas", "Florida", "Minnesota", "North Dakota", "Arizona", "West Virginia", "Illinois", "Maine", "Washington", "Missouri", "Wisconsin", "California", "Nebraska", "Mississippi", "Vermont", "Iowa", "New Jersey", "Hawaii", "Massachusetts", "Michigan", "Delaware", "Rhode Island", "New York")
Firearms_Data_KAD$StateName <- factor(Firearms_Data_KAD$StateName,
levels = Firearms_Data_KAD$StateName[order(Firearms_Data_KAD$Pct)])
Firearms_Data_KAD$StateName
## [1] Wyoming District of Columbia Arkansas
## [4] New Mexico Virginia Idaho
## [7] Alabama Nevada Alaska
## [10] Louisiana Pennsylvania Maryland
## [13] New Hampshire Georgia Indiana
## [16] Kentucky Utah Texas
## [19] Oklahoma Colorado South Carolina
## [22] South Dakota Ohio Oregon
## [25] Connecticut Montana Tennessee
## [28] North Carolina Kansas Florida
## [31] Minnesota North Dakota Arizona
## [34] West Virginia Illinois Maine
## [37] Washington Missouri Wisconsin
## [40] California Nebraska Mississippi
## [43] Vermont Iowa New Jersey
## [46] Hawaii Massachusetts Michigan
## [49] Delaware Rhode Island New York
## 51 Levels: New York Rhode Island Delaware Michigan ... Wyoming
# Bar plot by state
barplot <- qplot(StateName, data=Firearms_Data_KAD, geom="bar",
weight=Pct, ylab="PerCapita Registered Gun Owners")
barplot <- barplot + theme(axis.text.x = element_text(angle=90, hjust=1))
barplot + theme_wsj() + coord_flip()
What does this look like on a map? Do we see any regional trends?
Firearms_Data_KAD$region <- Firearms_Data_KAD$StateLower
Firearms_Data_KAD$value <- Firearms_Data_KAD$Pct
map <- state_choropleth(
Firearms_Data_KAD,
legend = "Pct",
title = "PerCapita Registered Gun Owners (2013)",
num_colors = 1,
zoom = NULL)
map + theme_map()
Rather than looking at distributions or state-specific data, what is the average gun ownership rate for each region?
# Average by Region
regiondata <- sqldf("select distinct Division, sum(`2013Pop`) as Pop, sum(`2013RegisteredOwners`) as Owners from Firearms_Data_KAD group by Division")
head(regiondata)
## Division Pop Owners
## 1 East North Central 46662180 430545
## 2 East South Central 18716202 249693
## 3 Middle Atlantic 34469280 204755
## 4 Mountain 22881245 466634
## 5 New England 14618806 112220
## 6 Pacific 51373178 415969
regiondata$Pct <- regiondata$Owners/regiondata$Pop
head(regiondata)
## Division Pop Owners Pct
## 1 East North Central 46662180 430545 0.009226851
## 2 East South Central 18716202 249693 0.013341008
## 3 Middle Atlantic 34469280 204755 0.005940217
## 4 Mountain 22881245 466634 0.020393733
## 5 New England 14618806 112220 0.007676414
## 6 Pacific 51373178 415969 0.008097007
regiondata$Division <- factor(regiondata$Division,
levels = regiondata$Division[order(regiondata$Pct)])
head(regiondata)
## Division Pop Owners Pct
## 1 East North Central 46662180 430545 0.009226851
## 2 East South Central 18716202 249693 0.013341008
## 3 Middle Atlantic 34469280 204755 0.005940217
## 4 Mountain 22881245 466634 0.020393733
## 5 New England 14618806 112220 0.007676414
## 6 Pacific 51373178 415969 0.008097007
barplot2 <- qplot(Division, data=regiondata, geom="bar",
weight=Pct, ylab="PerCapita Registered Gun Owners")
barplot2 + theme_wsj() + coord_flip()
In summation, we find the following:
However, don’t forget that the District of Columbia is #2
[THE END]