Reputation: 87
My dataframe looks like this:
df
city year wealth
a 2001 1
a 2002 30
b 2001 2
b 2002 20
c 2001 3
c 2002 10
I'm looking for a simple way to subset the dataframe based on city wealth relative only to cities within each year. So I'm going for an output like this:
top_third
city year wealth
a 2002 30
c 2001 3
mid_third
city year wealth
b 2001 2
b 2002 20
low_third
city year wealth
c 2002 10
a 2001 1
The approach I've been trying looks like this:
top_third <- subset(df, wealth > quantile(wealth, 0.66, na.rm = TRUE))
non_rich <- subset(df, wealth <=quantile(wealth, 0.66, na.rm = TRUE))
mid_third <- subset(non_rich, wealth > quantile(wealth, 0.5, na.rm = TRUE))
low_third <- subset(non_rich, wealth <=quantile(wealth, 0.5, na.rm = TRUE))
The biggest problem I'm having with this approach is that I can't find a way to calculate the quantile within each year. Does anyone know a simple way to do this?
Upvotes: 0
Views: 856
Reputation: 13118
Here's an approach using the dplyr
package. We group the data by year, then create a new column that indicates the group (which quantile) the city is in. We can then split
up the dataset by the new group column:
library(dplyr)
df <- df %>% group_by(year) %>%
mutate(group = cut(wealth, c(-Inf, quantile(wealth, c(1/3, 2/3)), Inf),
labels = 1:3))
split(df, df$group)
# $`1`
# Source: local data frame [2 x 4]
# Groups: year [2]
# city year wealth group
# <fctr> <int> <int> <fctr>
# 1 a 2001 1 1
# 2 c 2002 10 1
# $`2`
# Source: local data frame [2 x 4]
# Groups: year [2]
# city year wealth group
# <fctr> <int> <int> <fctr>
# 1 b 2001 2 2
# 2 b 2002 20 2
# $`3`
# Source: local data frame [2 x 4]
# Groups: year [2]
# city year wealth group
# <fctr> <int> <int> <fctr>
# 1 a 2002 30 3
# 2 c 2001 3 3
Upvotes: 1