R. Buchanan
R. Buchanan

Reputation: 87

Subset a dataframe based on within-group quantile

My dataframe looks like this:

df
city   year   wealth
a      2001   1
a      2002   30
b      2001   2
b      2002   20
c      2001   3
c      2002   10

I'm looking for a simple way to subset the dataframe based on city wealth relative only to cities within each year. So I'm going for an output like this:

top_third
city    year   wealth
a       2002   30
c       2001   3

mid_third
city   year    wealth
b      2001    2
b      2002    20

low_third
city   year    wealth
c      2002    10
a      2001    1

The approach I've been trying looks like this:

top_third <- subset(df, wealth > quantile(wealth, 0.66, na.rm = TRUE))
non_rich  <- subset(df, wealth <=quantile(wealth, 0.66, na.rm = TRUE))
mid_third <- subset(non_rich, wealth > quantile(wealth, 0.5, na.rm = TRUE))
low_third <- subset(non_rich, wealth <=quantile(wealth, 0.5, na.rm = TRUE))

The biggest problem I'm having with this approach is that I can't find a way to calculate the quantile within each year. Does anyone know a simple way to do this?

Upvotes: 0

Views: 856

Answers (1)

Weihuang Wong
Weihuang Wong

Reputation: 13118

Here's an approach using the dplyr package. We group the data by year, then create a new column that indicates the group (which quantile) the city is in. We can then split up the dataset by the new group column:

library(dplyr)
df <- df %>% group_by(year) %>%
  mutate(group = cut(wealth, c(-Inf, quantile(wealth, c(1/3, 2/3)), Inf),
                     labels = 1:3))
split(df, df$group)
# $`1`
# Source: local data frame [2 x 4]
# Groups: year [2]

#     city  year wealth  group
#   <fctr> <int>  <int> <fctr>
# 1      a  2001      1      1
# 2      c  2002     10      1

# $`2`
# Source: local data frame [2 x 4]
# Groups: year [2]

#     city  year wealth  group
#   <fctr> <int>  <int> <fctr>
# 1      b  2001      2      2
# 2      b  2002     20      2

# $`3`
# Source: local data frame [2 x 4]
# Groups: year [2]

#     city  year wealth  group
#   <fctr> <int>  <int> <fctr>
# 1      a  2002     30      3
# 2      c  2001      3      3

Upvotes: 1

Related Questions