Reputation: 183
I am trying to practice R and learn more in general. I would like to make a ratio of x crime per 100,000 people. The following is the head of my data. I decided to only use the 5 largest cities.
# A tibble: 6 x 13
City Popula~ `Viol~ `Mur~ `Rap~ `Rap~ Robbe~ `Aggr~ `Prop~ Burgl~ `Larc~ `Moto~ Arson
<chr> <dbl> <dbl> <dbl> <dbl> <lgl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Abingdon 8186 10.0 0 3.00 NA 1.00 6.00 233 20.0 198 15.0 4.00
2 Alexandria 148519 258 5.00 21.0 NA 118 114 2967 249 2427 291 13.0
3 Altavista 3486 8.00 0 0 NA 2.00 6.00 56.0 4.00 52.0 0 0
4 Amherst 2223 2.00 0 2.00 NA 0 0 27.0 6.00 19.0 2.00 0
5 Appalachia 1728 12.0 0 2.00 NA 2.00 8.00 77.0 25.0 51.0 1.00 0
6 Ashland 7310 26.0 0 1.00 NA 8.00 17.0 246 14.0 221 11.0 1.00
The following code is my attempt.
virginia_crime %>%
filter(Population > 180000) %>%
group_by(City) %>%
summarise(ratio_violent = `Violent
crime`/(Population/100000),
ratio_murder = `Murder and
nonnegligent
manslaughter`/(Population/100000))
The output is:
# A tibble: 5 x 3
City ratio_violent ratio_murder
<chr> <dbl> <dbl>
1 Chesapeake 320 3.90
2 Newport News 439 8.28
3 Norfolk 573 11.3
4 Richmond 624 17.4
5 Virginia Beach 162 3.77
I realize that I should be able to make a function that essentially creates a rate. Something like... rate <- (crime columns/(Population/1000). Am I even close in my idea, or should I be using one of the apply functions (sapply(summarise()))? I feel this task could be automated somehow, I just cannot figure it out. Would appreciate some insight
Upvotes: 0
Views: 1370
Reputation: 887971
Here is an option with mutate_at
. In the OP's code, summarise
is used, but it is to summarise an object with 'n' rows to a single row. The ratio always will be not be a single row (based on the OP's code and mutate
should be used in place of summarise
)
library(dplyr)
df1 %>%
filter(Population > 180000) %>%
mutate_at(3:13, funs(./Population/100000))
Upvotes: 2
Reputation: 78650
You can gather your columns (all besides city and population) first, which lets you operate on all of them at once:
library(tidyr)
crime_rates <- virginia_crime %>%
filter(Population > 180000) %>%
gather(Crime, Number, -City, -Population) %>%
mutate(Rate = Number / (Population / 100000))
This will end up with one row for each pair of city and crime, alongside the population, number, and rate.
If you want to turn it back into a wide form, you can use spread (after removing the Number
column):
crime_rates %>%
select(-Number) %>%
spread(Crime, Rate)
It's worth noting that the gathered (tidied) version is still quite useful, for example if you want to find the cities with the highest rates of each crime (perhaps to use in a graph):
crime_rates %>%
group_by(City) %>%
top_n(1, Rate)
Upvotes: 2