Reputation: 11
I have a list of names in df1 and I need to see if they match anywhere in df2. I know I probably need to use str.contains on each item and add one to a count, but I haven't figured out how to do this successfully.
for e in df2['People_separate']:
count = df1['People'].str.contains(e)
if count == True:
count += 1
return count
example: df1:
| People |
| -------- |
| A B / E F |
| A B / C D |
| E F |
df2 (looking to populate the 'counts' column:
| People_separate | Counts |
| --------------- | -------------|
| A B | 2 |
| C D | 1 |
| E F | 2 |
Upvotes: 1
Views: 198
Reputation: 59579
If the "not exact match" is really a requirement, then we form a search pattern to use with Series.str.extractall
, and take the value_counts of that extraction. This way if your search word is 'foo'
a word like 'foobar'
will still count as a match (because it contains 'foo'
).
The reindex
ensures the resulting Series also shows 0s for words that never matched.
import pandas as pd
df1 = pd.DataFrame({"People": ['A B / E F', 'A B / C D', 'E F']})
df2 = pd.DataFrame({"People_separate": ['A B', 'C D', 'E F', 'banana']})
pat = '(' + '|'.join(df2['People_separate']) + ')'
#(A B|C D|E F)
(df1['People'].str.extractall(pat)[0]
.value_counts()
.reindex(df2['People_separate'], fill_value=0))
People_separate
A B 2
C D 1
E F 2
banana 0
Name: 0, dtype: int64
Upvotes: 1
Reputation: 9941
You can split the rows by ' / ' with split
, then explode
to convert lists into rows, and then count values with value_counts
:
df['People'].str.split(' / ').explode().value_counts()
Output:
A B 2
E F 2
C D 1
Upvotes: 1