Reputation: 61
I have a DataFrame and I need to create a new column and fill the values acording to how many words in a list of words are found in a text. I'm trying de code below:
df = pd.DataFrame({'item': ['a1', 'a2', 'a3'],
'text': ['water, rainbow', 'blue, red, white','country,school,magic']})
list_of_words = ['water', 'pasta', 'black', 'magic', 'glasses', 'school' ,'book']
for index,row in df.iterrows():
text = row['text']
count_found_words = 0
for word in list_of_words:
found_words= re.findall(word, text)
if len(found_words)>0:
count_found_words += 1
df['found_words'] = count_found_words
This code actually create a new column, but fill all the rows with the last 'count_found_words' of the loop.
is there a right way to do this?
Upvotes: 1
Views: 73
Reputation: 14949
Or you can TRY:
df['found_words'] = df.text.str.split(',').apply(
lambda x: sum(i in list_of_words for i in x))
Upvotes: 1
Reputation: 18296
pattern = fr"\b({'|'.join(list_of_words)})\b"
df["found_words"] = df.text.str.findall(pattern).str.len()
This forms the regex \b(water|pasta|black|magic|glasses|school|book)\b
that looks for any of the words in the list. Finds all it could and reports the number of matches via .len
.
Upvotes: 2
Reputation: 13
You can define a function count_words that returns count_found_words and use df['found_words'] = df['text'].map(count_words)
Upvotes: -1