haimen
haimen

Reputation: 2015

Filter rows based on the dplyr groupby, summarize output

I have a dataset with two columns, metro, state. I give the following command in dplyr,

data %>% group_by(metro, State) %>% summarise(count = n())

I get the following output,

metro           State         count 
A                OH            703
A                NJ              3
B                GA           1453
B                CA            456
B                WA            123

I now want to filter out the rows in the dataframe which are only the maximum counts and leave out the remaining. I need to filter out the corresponding rows. The output after filtering out the rows for the following command should be,

data %>% group_by(metro, State) %>% summarise(count = n())

   metro           State         count 
    A                OH            703
    B                GA           1453

Where every metro has only state which is the state with maximum counts and remaining are removed.

The following is my trying,

data %>% group_by(metro, State) %>% filter(n() == max(n()))

But this is again giving out the same dataframe as input.

Can anybody help me in doing this? My output should be every metro should have a unique state which has the maximum counts and the remaining state entries should be removed.

Thanks

Upvotes: 1

Views: 1067

Answers (2)

akrun
akrun

Reputation: 886938

We can also use data.table

library(data.table)
setDT(data)[,  count := .N , .(metro, state)][,  .SD[count == max(count)] , .(metro)]

Upvotes: 0

akuiper
akuiper

Reputation: 214927

You need a double stage groupby, firstly groupby metro and state get the count and then groupby metro and filter out count that is not equal to the max count within each metro:

data1 <- data %>% group_by(metro, State) %>% mutate(count = n()) %>% 
                  group_by(metro) %>% filter(count == max(count))

nrow(data1)

Upvotes: 4

Related Questions