Reputation: 333
I am taking a course and it is asking me to count how many times a word appears in a column. I figured out a way, but it seems to be counting more times than the 'official' solution.
This is the code:
official_solution= reviews.description.map(lambda desc: "tropical" in desc).sum()
my_way = reviews['description'].str.count('tropical').sum()
print('official solution ' + str(official_solution))
print('my way ' + str(my_way))
It is returning this:
official solution 3607
my way 3703
I think that my solution would be the correct one if I need to know how many times it appears in total, for example, a string can have the word 'tropical' twice. The official way would count it as one while my solution would count it as two. Am I correct or I'm confused?
I'd appreciate anyone who'd help me clear this out.
Upvotes: 0
Views: 40
Reputation: 579
Yes, you are correct. To replicate the result, you can use contains
instead of count
.
Whenever you need to check the behavior of some function, do not hesitate to make up a tiny dataframe and run your own experiment:
import pandas as pd
data = {'col1':['tropical tropical', 'nothing', 'tropical']}
df = pd.DataFrame(data)
print("count", df.col1.str.count('tropical').sum())
print("contains", df.col1.str.contains('tropical').sum())
## result
## count 3
## contains 2
which confirms your hunch.
Upvotes: 1