table_101
table_101

Reputation: 139

How to find if a list is sub string of a string in a Data Frame column?

I am facing a challenge to find a substring from a list inside a DataFrame column

list =['ab', 'bc', 'ca']

DF1
Index|A
0    |ajbijio_ab_jadds
1    |bhjbj_ab_jiui

Expected OUTPUT:
DF
ab
ab

I have written something but it is giving error unhashable type: 'list'

DF1['A'].str.lower().str.contains(list)

Upvotes: 1

Views: 59

Answers (3)

BENY
BENY

Reputation: 323226

I am using findall

df["Found"] = df["A"].str.findall("|".join(lst)).str[0]

df
Out[82]: 
                  A Found
0  ajbijio_ab_jadds    ab
1     bhjbj_ab_jiui    ab
2       Hello World   NaN

Upvotes: 1

jezrael
jezrael

Reputation: 862406

Use Series.str.extract if need first match only with join list by | for regex OR:

L =['ab','bc','ca']

df['new'] = df['A'].str.extract('('+ '|'.join(L) + ')')
print (df)
                  A new
0  ajbijio_ab_jadds  ab
1     bhjbj_ab_jiui  ab

If need all matches use Series.str.findall with Series.str.join:

df['new'] = df['A'].str.findall('|'.join(L)).str.join(',')

Upvotes: 1

Rakesh
Rakesh

Reputation: 82755

Using str.extract

Ex:

import pandas as pd

lst =['ab','bc','ca']

df = pd.DataFrame({"A": ["ajbijio_ab_jadds", "bhjbj_ab_jiui", "Hello World"]})
df["Found"] = df["A"].str.extract("(" + "|".join(lst) + ")")
print(df)

Output:

                  A Found
0  ajbijio_ab_jadds    ab
1     bhjbj_ab_jiui    ab
2       Hello World   NaN

Upvotes: 1

Related Questions