Damien Dotta
Damien Dotta

Reputation: 939

How to filter rows by group

I would like from this tibble :

mydf <- tribble(
  ~siren_us, ~eff,
  "A",   3,
  "A",   2,
  "B",   1,
  "B",   NA,
  "C",   3,
  "C",   5,
  "C",   4,
  "D",   NA,
  "E",   1,
  "E",   NA
)

get this one :

more precisely, I want to retrieve the rows from mydf for which siren_us exists at least twice and which have the eff column equal to NA. In my example, I don't want the row "D" in result

result <- tribble(
  ~siren_us, ~eff,
  "B",   NA,
  "E",   NA
)

I think it could be madewith dplyr instructions group_by and filter ?

Many thanks in advance

Upvotes: 1

Views: 81

Answers (2)

AnilGoyal
AnilGoyal

Reputation: 26218

Do this to get results as desired. Use & within the filter statement.

dplyr method

mydf %>% group_by(siren_us) %>%
  filter(is.na(eff) & n() >1)
# A tibble: 2 x 2
# Groups:   siren_us [2]
  siren_us   eff
  <chr>    <dbl>
1 B           NA
2 E           NA

base R menthod

mydf[is.na(mydf$eff) & as.numeric(ave(mydf$siren_us, mydf$siren_us, FUN = length)) > 1,]

# A tibble: 2 x 2
  siren_us   eff
  <chr>    <dbl>
1 B           NA
2 E           NA

If you want to get all results with any NA, through base R, use this

mydf[ave(mydf$eff, mydf$siren_us, FUN = function(x){sum(is.na(x))}) >= 1 & 
       ave(mydf$siren_us, mydf$siren_us, FUN = length) > 1,]
# A tibble: 4 x 2
  siren_us   eff
  <chr>    <dbl>
1 B            1
2 B           NA
3 E            1
4 E           NA

Upvotes: 3

Sinh Nguyen
Sinh Nguyen

Reputation: 4487

Using group_by & filter from dplyr package

library(dplyr)
mydf %>%
  group_by(siren_us) %>%
  filter(
    # group that have at least two records
    n() >= 2 &
    # and at least one eff is.na
    any(is.na(eff)))

Output

# A tibble: 4 x 2
# Groups:   siren_us [2]
  siren_us   eff
  <chr>    <dbl>
1 B            1
2 B           NA
3 E            1
4 E           NA

Upvotes: 2

Related Questions