Reputation: 105
I need to Split Data and Map Values based on the list.
df
Id String
1 JHA PQR 20 STO KJAN
2 LKS JHA PLA; NIYM
3 LMA\KHA 20 HYS,KNSN
4 JHA, PQR STO 20 KJAM
5 JHA PQR|STO/KJAOP
List_to_map = [JHA, LMA, STO, PQR, LKS]
df_output
Id String Values
1 JHA PQR 20 STO KJAN JHA+PQR+STO
2 LKS JHA PLA; NIYM LKS+JHA
3 LMA\KHA 20 HYS,KNSN LMA
4 JHA, PQR STO 20 KJAM JHA+PQR+STO
5 JHA PQR|STO/KJAOP JHA+PQR+STO
I need to map the Column String Values to list, if list exist those values it needs to concat those values and create a new column.
Upvotes: 0
Views: 65
Reputation: 863166
Use Series.str.findall
with word boundaries for each value of lsit and then join together by Series.str.join
:
pat = '|'.join(r"\b{}\b".format(x) for x in List_to_map)
df['Values'] = df['String'].astype(str).str.findall(pat).str.join('+')
print (df)
Id String Values
0 1 JHA PQR STO KJAN JHA+PQR+STO
1 2 LKS JHA PLA; NIYM LKS+JHA
2 3 LMA\KHA HYS,KNSN LMA
3 4 JHA, PQR STO KJAM JHA+PQR+STO
4 5 JHA PQR|STO/KJAOP JHA+PQR+STO
Upvotes: 2
Reputation: 23217
You can use .str.split()
to split on non-word character with regex \W
, then get the common elements with List_to_map
by np.intersect1d()
. Finally, join the matching strings with +
using .str.join()
, as follows:
import numpy as np
List_to_map = ['JHA', 'LMA', 'STO', 'PQR', 'LKS']
df['Values'] = df['String'].str.split(r'\W').apply(lambda x: np.intersect1d(x, List_to_map)).str.join('+')
Result:
print(df)
Id String Values
0 1 JHA PQR STO KJAN JHA+PQR+STO
1 2 LKS JHA PLA; NIYM JHA+LKS
2 3 LMA\KHA HYS,KNSN LMA
3 4 JHA, PQR STO KJAM JHA+PQR+STO
4 5 JHA PQR|STO/KJAOP JHA+PQR+STO
Alternatively, if you want to maintain the sequence of original string, you can also use:
df['Values'] = df['String'].str.split(r'\W').apply(lambda x: [y for y in x if y in List_to_map]).str.join('+')
Result:
print(df)
Id String Values
0 1 JHA PQR STO KJAN JHA+PQR+STO
1 2 LKS JHA PLA; NIYM LKS+JHA
2 3 LMA\KHA HYS,KNSN LMA
3 4 JHA, PQR STO KJAM JHA+PQR+STO
4 5 JHA PQR|STO/KJAOP JHA+PQR+STO
Note that using the numpy function np.intersect1d()
is faster than using Python list comprehension. However, the matching list will be based on the List_to_map
string sequence. If string concat sequence is not important, I would recommend using np.intersect1d()
for faster execution time.
Upvotes: 2