Reputation: 4506
dataframe['Text'] = dataframe['Text'].apply(lambda x : ' '.join([item for item in string.split(x.lower()) if item not in stopwords]))
I am removing the stop words from the dataframe. Logic is working fine, but when there is some empty row comes it gives error.
I have used dropna() but it will drop the whole line instead there is data in other column.
How to add condition in above logic that column Text should not null
Upvotes: 1
Views: 1798
Reputation: 863266
You can replace NaN
to empty list
what is not easy - use mask
or combine_first
by Series
created by empty lists
:
pos_tweets = [('I love this car', 'positive'),
('This view is amazing', 'positive'),
('I feel great this morning', 'positive'),
('I am so excited about the concert', 'positive'),
(None, 'positive')]
df = pd.DataFrame(pos_tweets, columns= ["Text","col2"])
print (df)
Text col2
0 I love this car positive
1 This view is amazing positive
2 I feel great this morning positive
3 I am so excited about the concert positive
4 None positive
stopwords = ['love','car','amazing']
s = pd.Series([[]], index=df.index)
df["Text"] = df["Text"].str.lower().str.split().mask(df["Text"].isnull(), s)
print (df)
Text col2
0 [i, love, this, car] positive
1 [this, view, is, amazing] positive
2 [i, feel, great, this, morning] positive
3 [i, am, so, excited, about, the, concert] positive
4 [] positive
df['Text']=df['Text'].apply(lambda x:' '.join([item for item in x if item not in stopwords]))
print (df)
Text col2
0 i this positive
1 this view is positive
2 i feel great this morning positive
3 i am so excited about the concert positive
4 positive
Another solution:
stopwords = ['love','car','amazing']
df["Text"]=df["Text"].str.lower().str.split().combine_first(pd.Series([[]], index=df.index))
print (df)
Text col2
0 [i, love, this, car] positive
1 [this, view, is, amazing] positive
2 [i, feel, great, this, morning] positive
3 [i, am, so, excited, about, the, concert] positive
4 [] positive
Upvotes: 1
Reputation: 5891
use before your logic,
dataframe.dropna(subset=['Text'], how='all')
Upvotes: 1