Reputation: 139
I am facing a challenge to find a substring from a list inside a DataFrame column
list =['ab', 'bc', 'ca']
DF1
Index|A
0 |ajbijio_ab_jadds
1 |bhjbj_ab_jiui
Expected OUTPUT:
DF
ab
ab
I have written something but it is giving error unhashable type: 'list'
DF1['A'].str.lower().str.contains(list)
Upvotes: 1
Views: 59
Reputation: 323226
I am using findall
df["Found"] = df["A"].str.findall("|".join(lst)).str[0]
df
Out[82]:
A Found
0 ajbijio_ab_jadds ab
1 bhjbj_ab_jiui ab
2 Hello World NaN
Upvotes: 1
Reputation: 862406
Use Series.str.extract
if need first match only with join
list by |
for regex OR
:
L =['ab','bc','ca']
df['new'] = df['A'].str.extract('('+ '|'.join(L) + ')')
print (df)
A new
0 ajbijio_ab_jadds ab
1 bhjbj_ab_jiui ab
If need all matches use Series.str.findall
with Series.str.join
:
df['new'] = df['A'].str.findall('|'.join(L)).str.join(',')
Upvotes: 1
Reputation: 82755
Using str.extract
Ex:
import pandas as pd
lst =['ab','bc','ca']
df = pd.DataFrame({"A": ["ajbijio_ab_jadds", "bhjbj_ab_jiui", "Hello World"]})
df["Found"] = df["A"].str.extract("(" + "|".join(lst) + ")")
print(df)
Output:
A Found
0 ajbijio_ab_jadds ab
1 bhjbj_ab_jiui ab
2 Hello World NaN
Upvotes: 1