bsam
bsam

Reputation: 53

Python match the string of one column to the substring of another column

I need Python code that takes the text in column x and loops over column y and searches for the substring values x within each value in Y. My example is below. IF possible, I would like it to print the value of the match and the name in a dictionary or someway I convert it to a Pandas dataframe with a value for each column. I'm fairly new at this keep getting errors. My code and error is below.

matches=['cat','bat','fat']
names=['turtle','bigcats','hfat1']

for x in matches:
    if name.str.contains(x) == 1:
    print(name)

ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Upvotes: 3

Views: 2602

Answers (4)

piRSquared
piRSquared

Reputation: 294576

With Numpy's find

from numpy.core.defchararray import find

matches = np.array(['cat', 'bat', 'fat'])
names = np.array(['turtle', 'bigcats', 'hfat1'])

i, j = np.where(find(names[:, None], matches) > -1)

print(matches[j], names[i], sep='\n')

['cat' 'fat']
['bigcats' 'hfat1']

Wrapped in a Pandas series

pd.Series(dict(zip(matches[j], names[i])))

cat    bigcats
fat      hfat1
dtype: object

Upvotes: 2

Scott Boston
Scott Boston

Reputation: 153570

Since you tagged this question as pandas:

import pandas as pd
import numpy as np

matches=['cat','bat','fat']
names=['turtle','bigcats','hfat1']

df = pd.DataFrame({'Name':names,'Matches':matches})
print(df)

Starting dataframe:

  Matches     Name
0     cat   turtle
1     bat  bigcats
2     fat    hfat1

Use str access with contains and regex created by join:

df.loc[df.Name.str.contains('|'.join(df.Matches)),'Name'].tolist()

Output:

['bigcats', 'hfat1']

Upvotes: 4

I'm guessing this is what you're looking for? "name" is undefined as your second array is "names" and your "if" statement should look like below to find a value in an array:

matches=['cat','bat','fat']
names=['turtle','bigcats','hfat1']

for x in matches:
    if x in names:
        print(names)

Upvotes: 0

Nathan
Nathan

Reputation: 10346

I'm a little unsure about your question, but does this do what you want?

matches=['cat','bat','fat']
names=['turtle','bigcats','hfat1']

for x in matches:
    for name in names:
        if x in name:
            print(name)

Be aware that if you're using a pandas.Series and do series.str.contains(s) that will check whether s is in each value in series - this will return another Series with True or False for each. That's why you're getting the error - you're comparing a Series with an int, which doesn't work.

Upvotes: 0

Related Questions