Reputation: 1057
I am having dataframe idf
as below.
feature_name idf_weights
2488 kralendijk 11.221923
3059 night 0
1383 ebebf 0
I have another Dataframe df
message Number of Words in each message
0 night kralendijk ebebf 3
I want to add idf weights
from idf
for each word in the "df" dataframe in a new column.
The output will look like the below:
message Number of Words in each message Number of words with idf_score>0
0 night kralendijk ebebf 3 1
Here is what I've tried so far, but it's giving the total count of words instead of word having idf_weight>0
:
words_weights = dict(idf[['feature_name', 'idf_weights']].values)
df['> zero'] = df['message'].apply(lambda x: count([words_weights.get(word, 11.221923) for word in x.split()]))
Output
message Number of Words in each message Number of words with idf_score>0
0 night kralendijk ebebf 3 3
Thank you.
Upvotes: 0
Views: 48
Reputation: 260640
Try using a list comprehension:
# set up a dictionary for easy feature->weight indexing
d = idf.set_index('feature_name')['idf_weights'].to_dict()
# {'kralendijk': 11.221923, 'night': 0.0, 'ebebf': 0.0}
df['> zero'] = [sum(d.get(w, 0)>0 for w in x.split()) for x in df['message']]
## OR, slighlty faster alternative
# df['> zero'] = [sum(1 for w in x.split() if d.get(w, 0)>0) for x in df['message']]
output:
message Number of Words in each message > zero
0 night kralendijk ebebf 3 1
Upvotes: 3
Reputation: 120409
You can use str.findall
: the goal here is to create a list of feature names with a weight greater than 0 to find in each message.
pattern = fr"({'|'.join(idf.loc[idf['idf_weights'] > 0, 'feature_name'])})"
df['Number of words with idf_score>0'] = df['message'].str.findall(pattern).str.len()
print(df)
# Output
message Number of Words in each message Number of words with idf_score>0
0 night kralendijk ebebf 3 1
Upvotes: 3