Which of these functions is doing what I want it to?

Question

I am taking a course and it is asking me to count how many times a word appears in a column. I figured out a way, but it seems to be counting more times than the 'official' solution.

This is the code:

official_solution= reviews.description.map(lambda desc: "tropical" in desc).sum()
my_way = reviews['description'].str.count('tropical').sum()
print('official solution ' + str(official_solution))
print('my way ' + str(my_way))

It is returning this:

official solution 3607
my way 3703

I think that my solution would be the correct one if I need to know how many times it appears in total, for example, a string can have the word 'tropical' twice. The official way would count it as one while my solution would count it as two. Am I correct or I'm confused?

I'd appreciate anyone who'd help me clear this out.

rjg · Accepted Answer

Yes, you are correct. To replicate the result, you can use contains instead of count.

Whenever you need to check the behavior of some function, do not hesitate to make up a tiny dataframe and run your own experiment:

import pandas as pd

data = {'col1':['tropical tropical', 'nothing', 'tropical']}
df = pd.DataFrame(data)
print("count", df.col1.str.count('tropical').sum())
print("contains", df.col1.str.contains('tropical').sum())

## result
## count 3
## contains 2

which confirms your hunch.

Which of these functions is doing what I want it to?

Answers (1)

Related Questions