Reputation: 2015
I have a dataset with two columns, metro, state. I give the following command in dplyr,
data %>% group_by(metro, State) %>% summarise(count = n())
I get the following output,
metro State count
A OH 703
A NJ 3
B GA 1453
B CA 456
B WA 123
I now want to filter out the rows in the dataframe which are only the maximum counts and leave out the remaining. I need to filter out the corresponding rows. The output after filtering out the rows for the following command should be,
data %>% group_by(metro, State) %>% summarise(count = n())
metro State count
A OH 703
B GA 1453
Where every metro has only state which is the state with maximum counts and remaining are removed.
The following is my trying,
data %>% group_by(metro, State) %>% filter(n() == max(n()))
But this is again giving out the same dataframe as input.
Can anybody help me in doing this? My output should be every metro should have a unique state which has the maximum counts and the remaining state entries should be removed.
Thanks
Upvotes: 1
Views: 1067
Reputation: 886938
We can also use data.table
library(data.table)
setDT(data)[, count := .N , .(metro, state)][, .SD[count == max(count)] , .(metro)]
Upvotes: 0
Reputation: 214927
You need a double stage groupby, firstly groupby metro and state get the count and then groupby metro and filter out count that is not equal to the max count within each metro:
data1 <- data %>% group_by(metro, State) %>% mutate(count = n()) %>%
group_by(metro) %>% filter(count == max(count))
nrow(data1)
Upvotes: 4