Reputation: 147
I wont to search a df.column for a partial strings that I saved in a series and wont to create a new column with the str that I found in each row. A part of my question was solved by pandas: test if string contains one of the substrings in a list:
For example, say I have the series s = pd.Series(['cat','hat','dog','fog','pet']), and I want to find all places where s contains any of ['og', 'at'], I would want to get everything but pet.
The solution is:
>>> searchfor = ['og', 'at']
>>> s[s.str.contains('|'.join(searchfor))]
0 cat
1 hat
2 dog
3 fog
dtype: object
but I would like to get
pet contains
0 cat at
1 hat at
2 dog og
3 fog og
dtype: object
Upvotes: 2
Views: 185
Reputation: 862591
Use extract
and if no match get NaN
s, so add dropna
:
searchfor = ['og', 'at']
df['new'] = df['pet'].str.extract('(' + '|'.join(searchfor) + ')', expand=False)
df = df.dropna(subset=['new'])
print (df)
pet contains1 new
0 cat at at
1 hat at at
2 dog og og
3 fog og og
Upvotes: 2