Puja Gupta
Puja Gupta

Reputation: 49

Case-when string matching

I have a df(Below) and a vector as c("B", "F"), I want to first split the df as list based on ID and then if any of the values in column "Name" matches in the vector c("B", "F") then assign column "Final" with that particular value in the vector for all rows.

ID  Name
1   A   
1   B   
2   C   
1   D   
2   E   
2   F
3   C

what I have tried till now is below :

df_list <- dlply(df, "ID")
df_list_2 <- lapply(df_list, transform, 
                                  Final = case_when(
                                     sum(str_count(grepl(Name, "B"))) >= 1 ~ "B",
                                     sum(str_count(grepl(Name, "F"))) >= 1 ~ "F",
                                     TRUE ~ "No"))

My Final result should be as below

List 1 :

ID  Name    Final
1   A         B
1   B         B
1   D         B

List 2:



ID  Name    Final

2   C         F
2   E         F
2   F         F

List 3 :

 ID Name    Final
 3      C       NO

This is just a sample data, I have to run this on million of records with the vector list containing about 20 string values

Upvotes: 1

Views: 1802

Answers (2)

Sotos
Sotos

Reputation: 51582

Here is a base R idea based on your description,

lapply(split(df, df$ID), function(i) {
       i1 <- i$Name[i$Name %in% v1]; 
       data.frame(i, Final = replace(i1, length(i1) == 0, 'NO'))
      })

which gives,

$`1`
  ID Name Final
1  1    A     B
2  1    B     B
4  1    D     B

$`2`
  ID Name Final
3  2    C     F
5  2    E     F
6  2    F     F

$`3`
  ID Name Final
7  3    C    NO

DATA:

dput(df)
structure(list(ID = c(1L, 1L, 2L, 1L, 2L, 2L, 3L), Name = c("A", 
"B", "C", "D", "E", "F", "C")), row.names = c(NA, -7L), class = "data.frame")

dput(v1)
c("B", "F")

EDIT: If you have more than 1 final elements, then you can convert to strings, i.e.

lapply(split(df, df$ID), function(i) {i1 <- i$Name[i$Name %in% v1]; 
                            data.frame(i, Final = ifelse(length(unique(i1)) > 1, 
                             toString(unique(i1)), ifelse(length(unique(i1)) == 0, 'NO', i1)))})

Upvotes: 2

kath
kath

Reputation: 7724

You can also use group_by from dplyr:

library(plyr)  # Load plyr first if you use it with dplyr
library(dplyr)

match_vector <- c("B", "F")

df_new <- df %>% 
  group_by(ID) %>% 
  mutate(Final = if_else(any(Name %in% match_vector), 
                         paste(match_vector[match_vector %in% Name], collapse = ";"), 
                         "No"))

df_new
# A tibble: 7 x 3
# Groups:   ID [3]
#     ID Name  Final
#   <int> <chr> <chr>
# 1     1 A     B    
# 2     1 B     B    
# 3     2 C     F    
# 4     1 D     B    
# 5     2 E     F    
# 6     2 F     F    
# 7     3 C     No   

I used paste(match_vector[match_vector %in% Name], collapse = ";") here, which would print all matches in the Final column, if there are several. If you don't want this use match_vector[match_vector %in% Name][1] to get the first match.

If you need the list-structure you can then use split (base R) or dlply (plyr):

dlply(df_new, "ID")
split(df_new, df_new$ID)

$`1`
# A tibble: 3 x 3
# Groups:   ID [1]
     ID Name  Final
  <int> <chr> <chr>
1     1 A     B    
2     1 B     B    
3     1 D     B    

$`2`
# A tibble: 3 x 3
# Groups:   ID [1]
     ID Name  Final
  <int> <chr> <chr>
1     2 C     F    
2     2 E     F    
3     2 F     F    

$`3`
# A tibble: 1 x 3
# Groups:   ID [1]
     ID Name  Final
  <int> <chr> <chr>
1     3 C     No 

Upvotes: 1

Related Questions