How to count word count in Data Frame using word list?

Question

I have a question about word count using python.

Data Frame have three columns.(id, text, word)

First, This is example table.

[Data Frame]

df = pd.DataFrame({
    "id":[
        "100",
        "200",
        "300"
    ],
    "text":[
        "The best part of Zillow is you can search/view thousands of home within a click of a button without even stepping out of your door.At the comfort of your home you can get all the details such as the floor plan, tax history, neighborhood, mortgage calculator, school ratings etc. and also getting in touch with the contact realtor is just a click away and you are scheduled for the home tour!As a first time home buyer, this website greatly helped me to study the market before making the right choice.",
        "I love all of the features of the Zillow app, especially the filtering options and the feature that allows you to save customized searches.",
        "Data is not updated spontaneously. Listings are still shown as active while the Mls shows pending or closed."
    ],
        "word":[
        "[best, word, door, subway, rain]",
        "[item, best, school, store, hospital]",
        "[gym, mall, pool, playground]",
    ]
    })

I already split text to make dictionary.

So, I want to each line word list checked to text.

This is result what I want.

| id |                   word dict                          |
| -- | -----------------------------------------------      |
| 100| {best: 1, word: 0, door: 1, subway: 0 , rain: 0}     |         
| 200| {item: 0, best: 0, school: 0, store: 0, hospital: 0} |
| 300| {gym: 0, mall: 0, pool: 0, playground: 0}            |

Please, check this issue.

PacketLoss · Accepted Answer

We can use re to extract all of the words in our list. Noting, this will only match words in your list, not numbers.

Then apply a function that returns a dict with the count of each word in the list. We can then apply this function to a new column in the df.

import re

def count_words(row):
    words = re.findall(r'(\w+)', row['word'])
    return {word: row['text'].count(word) for word in words}

df['word_counts'] = df.apply(lambda x: count_words(x), axis=1)

Outputs

    id  ...                                        word_counts
0  100  ...  {'best': 1, 'word': 0, 'door': 1, 'subway': 0,...
1  200  ...  {'item': 0, 'best': 0, 'school': 0, 'store': 0...
2  300  ...  {'gym': 0, 'mall': 0, 'pool': 0, 'playground': 0}

[3 rows x 4 columns]

How to count word count in Data Frame using word list?

Answers (2)

Related Questions