Sharath
Sharath

Reputation: 2267

Filter the data by joining 2 columns (strings containing comma) in R

I have a df

ID <- c('DX154','DX154','DX155','DX155','DX156','DX157','DX158','DX159') 
Country <- c('US','US','US','US')
Level <- c('Level_1A','Level_1A','Level_1B','Level_1B','Level_1A','Level_1B','Level_1B','Level_1A')
Type_A <- c('Iphone','Iphone','Android','Android','aaa','bbb','ccc','ddd')
Type_B <- c("Iphone,Ipad,Ipod,Mac","Gmail,Android,Drive,Maps","Iphone,Ipad,Ipod,Mac","Gmail,Android,Drive,Maps","ALL","ALL","ALL","ALL")
df <- data.frame(ID ,Country ,Level ,Type_A,Type_B)

df

           ID Country    Level  Type_A                   Type_B
1 DX154      US Level_1A  Iphone     Iphone,Ipad,Ipod,Mac
2 DX154      US Level_1A  Iphone Gmail,Android,Drive,Maps
3 DX155      US Level_1B Android     Iphone,Ipad,Ipod,Mac
4 DX155      US Level_1B Android Gmail,Android,Drive,Maps
5 DX156      US Level_1A     aaa                      ALL
6 DX157      US Level_1B     bbb                      ALL
7 DX158      US Level_1B     ccc                      ALL
8 DX159      US Level_1A     ddd                      ALL

I am trying to filer this data frame by joining the column Type_A, Type_B but not knowing how to parse the comma. Could someone please help me with this.

My Desired output is

        ID Country    Level  Type_A                   Type_B
1 DX154      US Level_1A  Iphone     Iphone,Ipad,Ipod,Mac
2 DX155      US Level_1B Android Gmail,Android,Drive,Maps
3 DX156      US Level_1A     aaa                      ALL
4 DX157      US Level_1B     bbb                      ALL
5 DX158      US Level_1B     ccc                      ALL
6 DX159      US Level_1A     ddd                      ALL

Upvotes: 1

Views: 198

Answers (2)

akrun
akrun

Reputation: 887691

We group by 'ID', use grepl, specify the pattern by pasteing the 'Type_A' column (In this example, using Type_A[1L] should also work as the 'Type_A' elements are duplicated. A better example would be nice) and use this to filter the rows. We also use grepl to filter those elements in 'Type_B' that has no , from start (^) to end ($) of the string.

library(dplyr)
df %>% 
     group_by(ID) %>%
     filter(grepl(paste(Type_A, collapse='|'),
            Type_B)|grepl('^[^,]+$', Type_B))

#     ID Country    Level  Type_A                   Type_B
#1 DX154      US Level_1A  Iphone     Iphone,Ipad,Ipod,Mac
#2 DX155      US Level_1B Android Gmail,Android,Drive,Maps
#3 DX156      US Level_1A     aaa                      ALL
#4 DX157      US Level_1B     bbb                      ALL
#5 DX158      US Level_1B     ccc                      ALL
#6 DX159      US Level_1A     ddd                      ALL

Upvotes: 2

Benjamin
Benjamin

Reputation: 17279

Here's one solution. It's kind of gimmicky, but someone will be along to give you the super clever and speedy version soon. This does it row-wise, but Akrun's answer shows you how to do it by id only.

library(dplyr)
df <- df %>%
  mutate(row_id = 1:n()) %>%
  group_by(row_id) %>%
  filter(grepl(Type_A, Type_B) | Type_B === "ALL")

Upvotes: 3

Related Questions