aman
aman

Reputation: 151

How to find strings

I have a df as below

df:

Name               |Code
-------------------+-----
aman               |abc
akash              |bcd
rudra              |cde
Tushar             |def
Kartik             |efg
aman,akash         |fgh
akash,rudra        |ghi
akash,rudra,aman   |ijk
aman,Tushar        |jkl
Kartik,Tushar      |klm
rudra,Kartik,akash |lmn

i want to search code for below df

Name            |
----------------+
aman,akash,rudra|
Tushar,aman     |
Kartik          |
rudra,akash     |

to get below result

Name            |code
----------------+-----
aman,akash,rudra|ijk
Tushar,aman     |jkl
Kartik          |efg
rudra,akash     |ghi

Please note that combo of "rudra,akash"occur three times, in that case it returns the result alphabatical order of code

let me know if there is some way to achieve this.

Upvotes: 2

Views: 69

Answers (1)

akrun
akrun

Reputation: 887511

We can use cSplit from splitstackshape to split the 'Name' column and reshape it to 'long' format ('dfN'), grouped by 'Code', and run-length-id of 'Code', we paste the 'Name' after sorting. Do the same with 'df2', match the 'Name' columns in both the dataset and get the 'Code' in 'dfN' that corresponds to create a new column 'code' in 'df2'

library(splitstackshape)
dfN <- cSplit(df, "Name", ",", "long")[, .(Name = paste(sort(Name), collapse=",")), 
             by =  .(grp = rleid(Code), Code)]
df2$grp <- seq_len(nrow(df2))
df2$code <-  cSplit(df2, "Name", ",", "long")[, .(Name = paste(sort(Name), 
     collapse=",")), .(grp)][, dfN$Code[match(Name, dfN$Name)]]
df2$grp <- NULL
df2
#              Name code
#1 aman,akash,rudra  ijk
#2      Tushar,aman  jkl
#3           Kartik  efg
#4      rudra,akash  ghi

Upvotes: 1

Related Questions