Reputation: 53
I need Python code that takes the text in column x and loops over column y and searches for the substring values x within each value in Y. My example is below. IF possible, I would like it to print the value of the match and the name in a dictionary or someway I convert it to a Pandas dataframe with a value for each column. I'm fairly new at this keep getting errors. My code and error is below.
matches=['cat','bat','fat']
names=['turtle','bigcats','hfat1']
for x in matches:
if name.str.contains(x) == 1:
print(name)
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Upvotes: 3
Views: 2602
Reputation: 294576
With Numpy's find
from numpy.core.defchararray import find
matches = np.array(['cat', 'bat', 'fat'])
names = np.array(['turtle', 'bigcats', 'hfat1'])
i, j = np.where(find(names[:, None], matches) > -1)
print(matches[j], names[i], sep='\n')
['cat' 'fat']
['bigcats' 'hfat1']
Wrapped in a Pandas series
pd.Series(dict(zip(matches[j], names[i])))
cat bigcats
fat hfat1
dtype: object
Upvotes: 2
Reputation: 153570
Since you tagged this question as pandas:
import pandas as pd
import numpy as np
matches=['cat','bat','fat']
names=['turtle','bigcats','hfat1']
df = pd.DataFrame({'Name':names,'Matches':matches})
print(df)
Starting dataframe:
Matches Name
0 cat turtle
1 bat bigcats
2 fat hfat1
Use str
access with contains
and regex created by join
:
df.loc[df.Name.str.contains('|'.join(df.Matches)),'Name'].tolist()
Output:
['bigcats', 'hfat1']
Upvotes: 4
Reputation: 610
I'm guessing this is what you're looking for? "name" is undefined as your second array is "names" and your "if" statement should look like below to find a value in an array:
matches=['cat','bat','fat']
names=['turtle','bigcats','hfat1']
for x in matches:
if x in names:
print(names)
Upvotes: 0
Reputation: 10346
I'm a little unsure about your question, but does this do what you want?
matches=['cat','bat','fat']
names=['turtle','bigcats','hfat1']
for x in matches:
for name in names:
if x in name:
print(name)
Be aware that if you're using a pandas.Series
and do series.str.contains(s)
that will check whether s
is in each value in series
- this will return another Series
with True
or False
for each. That's why you're getting the error - you're comparing a Series
with an int
, which doesn't work.
Upvotes: 0