Reputation: 1940
I am trying sentiment analysis from the twitter post. I am new to sentiment analysis.In text preprocessing phase, I have encountered a problem to remove frequent words from twits. i want to remove most frequent words from twits so I have counted most frequent terms in twit by
freq=pd.Series(''.join(traindata['tweet']).split()).value_counts()[:10]
then i have converted the freq series into list
freq=list(freq.index)
Up to this point, my result is showing
For filtering my twitter_word column by removing frequently used words. I have used below code
traindata['tweet']=traindata.apply(lambda x:" ".join(x for x in x.split() if x not in freq))
and I have got below error
File "C:\Users\codemen\Anaconda3\lib\site-packages\pandas\core\generic.py", line 3614, in __getattr__ return object.__getattribute__(self, name) AttributeError: ("'Series' object has no attribute 'split'", 'occurred at index id')
kindly help me to figure out the problem. Thank you
Upvotes: 1
Views: 8998
Reputation: 863351
I believe you need specify column for apply
, else looping all columns of DataFrame
:
f = lambda x:" ".join(x for x in x.split() if x not in freq)
traindata['tweet'] = traindata['tweet'].apply(f)
Upvotes: 0