Divyank
Divyank

Reputation: 1057

Counting Words from one columns of Dataframe to Another Dataframe column

I am having dataframe idf as below.

       feature_name idf_weights
2488    kralendijk  11.221923
3059    night       0
1383    ebebf       0

I have another Dataframe df

     message                   Number of Words in each message  
0   night kralendijk ebebf          3 

I want to add idf weights from idf for each word in the "df" dataframe in a new column.

The output will look like the below:

    message                   Number of Words in each message   Number of words with idf_score>0
0   night kralendijk ebebf                 3                     1

Here is what I've tried so far, but it's giving the total count of words instead of word having idf_weight>0:

words_weights = dict(idf[['feature_name', 'idf_weights']].values)
df['> zero'] = df['message'].apply(lambda x: count([words_weights.get(word, 11.221923) for word in x.split()]))

Output

     message                   Number of Words in each message   Number of words with idf_score>0
0   night kralendijk ebebf                 3                     3

Thank you.

Upvotes: 0

Views: 48

Answers (2)

mozway
mozway

Reputation: 260640

Try using a list comprehension:

# set up a dictionary for easy feature->weight indexing
d = idf.set_index('feature_name')['idf_weights'].to_dict()
# {'kralendijk': 11.221923, 'night': 0.0, 'ebebf': 0.0}

df['> zero'] = [sum(d.get(w, 0)>0 for w in x.split()) for x in df['message']]

## OR, slighlty faster alternative
# df['> zero'] = [sum(1 for w in x.split() if d.get(w, 0)>0) for x in df['message']]

output:

                  message  Number of Words in each message  > zero
0  night kralendijk ebebf                                3       1

Upvotes: 3

Corralien
Corralien

Reputation: 120409

You can use str.findall: the goal here is to create a list of feature names with a weight greater than 0 to find in each message.

pattern = fr"({'|'.join(idf.loc[idf['idf_weights'] > 0, 'feature_name'])})"
df['Number of words with idf_score>0'] = df['message'].str.findall(pattern).str.len()
print(df)

# Output
                  message  Number of Words in each message  Number of words with idf_score>0
0  night kralendijk ebebf                                3                                 1

Upvotes: 3

Related Questions