Reputation: 13
I was following the next question: Python remove stop words from pandas dataframe
but it doesnt work for me for a customized stop words list, check out this code:
pos_tweets = [('I love this car', 'positive'),
('This view is amazing', 'positive'),
('I feel great this morning', 'positive'),
('I am so excited about the concert', 'positive'),
('He is my best friend', 'positive')]
import pandas as pd
test = pd.DataFrame(pos_tweets)
test.columns = ["tweet","col2"]
test["tweet"] = test["tweet"].str.lower().str.split()
stop = ['love','car','amazing']
test['tweet'].apply(lambda x: [item for item in x if item not in stop)
print test
the result is:
tweet col2
0 [i, love, this, car] positive
1 [this, view, is, amazing] positive
2 [i, feel, great, this, morning] positive
3 [i, am, so, excited, about, the, concert] positive
4 [he, is, my, best, friend] positive
the words love, car and amazing are still there, what Im missing?
thanks!
Upvotes: 1
Views: 4288
Reputation: 863266
You need assign output back to column tweet
:
test['tweet'] = test['tweet'].apply(lambda x: [item for item in x if item not in stop])
print (test)
tweet col2
0 [i, this] positive
1 [this, view, is] positive
2 [i, feel, great, this, morning] positive
3 [i, am, so, excited, about, the, concert] positive
4 [he, is, my, best, friend] positive
Upvotes: 1