Chellarioo
Chellarioo

Reputation: 27

Conditional subset of data frame by special condition

 df1 <-
 data.frame(Sector=c("auto","auto","auto","industry","industry","industry"),
 Topic=c("1","2","3","3","5","5"), 
 Frequency=c(1,2,5,2,3,2))
 df1

 df2 <- 
 data.frame(Sector=c("auto","auto","auto"),
 Topic=c("1","2","3"), 
 Frequency=c(1,2,5))
 df2

I have the dataframe 1 (df1) above and want a conditional subset of it that looks like df2. The condition is as followed:

"If at least one observation of the corresponding sectors has a larger frequency than 3 it should keep all observation of the sector, if not, all observations of the corresponding sector should be dropped." In the example obove, only the three observations of the auto-sector remain, industry is dropped.

Has anybody an idea by which condition I might achieve the aimed subset?

Upvotes: 1

Views: 325

Answers (2)

jogo
jogo

Reputation: 12569

Here is a solution with base R:

df1 <-
  data.frame(Sector=c("auto","auto","auto","industry","industry","industry"),
             Topic=c("1","2","3","3","5","5"), 
             Frequency=c(1,2,5,2,3,2))
subset(df1, ave(Frequency, Sector, FUN=max) >3)

and a solution with data.table:

library("data.table")
setDT(df1)[, if (max(Frequency)>3) .SD, by=Sector]

Upvotes: 2

www
www

Reputation: 39174

We can use group_by and filter from to achieve this.

library(dplyr)

df2 <- df1 %>%
  group_by(Sector) %>%
  filter(any(Frequency > 3)) %>%
  ungroup()
df2
# # A tibble: 3 x 3
#   Sector Topic Frequency
#   <fct>  <fct>     <dbl>
# 1 auto   1            1.
# 2 auto   2            2.
# 3 auto   3            5.

Upvotes: 2

Related Questions