Reputation: 749
I am looking for something like this.
tweets = pd.DataFrame()
tweets['worldwide'] = [tweets['user.location'] == ["Worldwide", "worldwide", "WorldWide]]
The new column 'worldwide' has boolean values (True, False) by checking column tweets['user.location'] which has three different types of spellings of worldwide.
I want that value "True" should be returned for all the tree formats of spelling "worldwide".
Upvotes: 1
Views: 3646
Reputation: 749
I have this as final form:
tweets['worldwide'] = tweets['user.location'].str.lower().str.contains("worldwide")
and the final count emerged as:
tweets['worldwide'].value_counts()
False 4998
True 185
Name: worldwide, dtype: int64
Upvotes: 0
Reputation: 394099
IIUC then you want isin
:
tweets['worldwide'] = [tweets['user.location'].isin(["Worldwide", "worldwide", "WorldWide"])]
This will return True
if any of the values are present
In [229]:
df = pd.DataFrame({'Tweets':['worldwide', 'asdas', 'Worldwide', 'WorldWide']})
df
Out[229]:
Tweets
0 worldwide
1 asdas
2 Worldwide
3 WorldWide
In [230]:
df['Worldwide'] = df['Tweets'].isin(["Worldwide", "worldwide", "WorldWide"])
df
Out[230]:
Tweets Worldwide
0 worldwide True
1 asdas False
2 Worldwide True
3 WorldWide True
However, I personally think there is more mileage in normalising the tweets so you compare to a single representation by lowercasing the tweets using str.lower
and then use str.contains
to test if the tweets contain your word:
In [231]:
df['Worldwide'] = df['Tweets'].str.lower().str.contains("worldwide")
df
Out[231]:
Tweets Worldwide
0 worldwide True
1 asdas False
2 Worldwide True
3 WorldWide True
Upvotes: 1