Adamm
Adamm

Reputation: 2306

Filter groups of df when each row of group fulfills the condition in at least N columns

I have a small trouble with my df. Firstly I'll show you an example, then explain what I'd like to recieve.

My input df:

C1  C2  C3  C4  C5  C6  C7  C8
A   I   I   D   X   I   I   I
A   I   I   I   X   D   I   I
A   I   I   I   X   I   I   I
A   I   D   I   X   NC  I   I
B   D   D   I   X   I   I   I
B   D   I   NC  X   I   I   D
C   NC  I   I   X   NC  D   I
C   I   I   I   X   I   I   I
C   I   I   I   X   I   I   D
D   NC  NC  I   X   D   D   D
D   I   I   I   X   D   D   I
D   D   D   I   X   I   I   NC
D   I   I   I   X   NC  I   I
E   NC  I   I   X   I   I   D
E   I   I   I   X   I   D   D

Desired result:

C1  C2  C3  C4  C5  C6  C7  C8
A   I   I   D   X   I   I   I
A   I   I   I   X   D   I   I
A   I   I   I   X   I   I   I
A   I   D   I   X   NC  I   I

I'd like to have only groups (group by column 'C1') (with all rows) where there're at least 2 occurances of 'I' in each row of group (let's take A) in group columns C2, C3, C4 and C6, C7, C8.

I decided to use filter(), all() and rowSums()

 df_filtered <- df %>%
  group_by(C1) %>%
  filter(all(rowSums(df[,2:4] == 'I' & df[,6:8] == 'I') >= 2))

wht doesn't it work? It returns 0 rows, no idea why...

Upvotes: 1

Views: 49

Answers (2)

duckmayr
duckmayr

Reputation: 16940

Solution

df %>%
    mutate(condition = rowSums(.[2:4] == 'I') >= 2 & rowSums(.[6:8] == 'I') >= 2) %>%
    group_by(C1) %>%
    filter(all(condition)) %>%
    select(-condition)

Result

# A tibble: 4 x 8
# Groups:   C1 [1]
      C1     C2     C3     C4     C5     C6     C7     C8
  <fctr> <fctr> <fctr> <fctr> <fctr> <fctr> <fctr> <fctr>
1      A      I      I      D      X      I      I      I
2      A      I      I      I      X      D      I      I
3      A      I      I      I      X      I      I      I
4      A      I      D      I      X     NC      I      I

Explanation

When you used

filter(all(rowSums(df[,2:4] == 'I' & df[,6:8] == 'I') >= 2))

the all() comparison was on all rows of df, not just the ones from your group. This approach evaluates the condition for each row, then calls all() only on the group.

Upvotes: 2

gavg712
gavg712

Reputation: 310

you could try with unite() and then filtering by regular expressions. Here your example:

library(tidyverse)

# First loading your data 
data <-read.table(text = "C1  C2  C3  C4  C5  C6  C7  C8
A   I   I   D   X   I   I   I
A   I   I   I   X   D   I   I
A   I   I   I   X   I   I   I
A   I   D   I   X   NC  I   I
B   D   D   I   X   I   I   I
B   D   I   NC  X   I   I   D
C   NC  I   I   X   NC  D   I
C   I   I   I   X   I   I   I
C   I   I   I   X   I   I   D
D   NC  NC  I   X   D   D   D
D   I   I   I   X   D   D   I
D   D   D   I   X   I   I   NC
D   I   I   I   X   NC  I   I
E   NC  I   I   X   I   I   D
E   I   I   I   X   I   D   D", header = T)

# Then filtering rows
data %>% 
  # Creating a helper column
  unite(merged, C1:C8, sep = "", remove = F) %>%
  # Filtering by regexp
  filter(grepl("^A", merged), grepl("II", merged)) %>%
  # Deleting helper column
  select(-merged)

This is the results:

  C1 C2 C3 C4 C5 C6 C7 C8
1  A  I  I  D  X  I  I  I
2  A  I  I  I  X  D  I  I
3  A  I  I  I  X  I  I  I
4  A  I  D  I  X NC  I  I

have fun ;)

Upvotes: 0

Related Questions