ambrish dhaka
ambrish dhaka

Reputation: 749

DataFrame add boolean column by checking multiple parameters

I am looking for something like this.

tweets = pd.DataFrame()

tweets['worldwide'] = [tweets['user.location'] == ["Worldwide", "worldwide", "WorldWide]]

The new column 'worldwide' has boolean values (True, False) by checking column tweets['user.location'] which has three different types of spellings of worldwide.

I want that value "True" should be returned for all the tree formats of spelling "worldwide".

Upvotes: 1

Views: 3646

Answers (2)

ambrish dhaka
ambrish dhaka

Reputation: 749

I have this as final form: tweets['worldwide'] = tweets['user.location'].str.lower().str.contains("worldwide")

and the final count emerged as:

tweets['worldwide'].value_counts()


False    4998
 True      185
 Name: worldwide, dtype: int64

Upvotes: 0

EdChum
EdChum

Reputation: 394099

IIUC then you want isin:

tweets['worldwide'] = [tweets['user.location'].isin(["Worldwide", "worldwide", "WorldWide"])]

This will return True if any of the values are present

In [229]:
df = pd.DataFrame({'Tweets':['worldwide', 'asdas', 'Worldwide', 'WorldWide']})
df

Out[229]:
      Tweets
0  worldwide
1      asdas
2  Worldwide
3  WorldWide

In [230]:
df['Worldwide'] = df['Tweets'].isin(["Worldwide", "worldwide", "WorldWide"])
df

Out[230]:
      Tweets Worldwide
0  worldwide      True
1      asdas     False
2  Worldwide      True
3  WorldWide      True

However, I personally think there is more mileage in normalising the tweets so you compare to a single representation by lowercasing the tweets using str.lower and then use str.contains to test if the tweets contain your word:

In [231]:
df['Worldwide'] = df['Tweets'].str.lower().str.contains("worldwide")
df

Out[231]:
      Tweets Worldwide
0  worldwide      True
1      asdas     False
2  Worldwide      True
3  WorldWide      True

Upvotes: 1

Related Questions