KeyError when cleaning tweets column using stop words in python

Question

I have a data frame of tweets and I'm trying to clean my 'tweet' column- remove stop words and use lemmatization.

Below is my code:

stop_words = set(stopwords.words('english'))
lemmatizer= WordNetLemmatizer()

sentence = df['tweet'].apply(nltk.sent_tokenize)

 0 [ 'country year happy']
 1 [ 'wish happy year']
 2 [ 'live year together']

for i in range(len(sentence)): 
    words=nltk.word_tokenize(str(sentence[i]))
    words=[lemmatizer.lemmatize(word) for word in words if word not in 
          set(stopwords.words('english'))]
    sentence[i]=' '.join(words)

The code above gives me the following error: (I included all the traceback)

 KeyError  Traceback (most recent call last)
 in 
  1 for i in range(len(sentence)):
----> 2     words=nltk.word_tokenize(str(sentence[i]))
  3     words=[lemmatizer.lemmatize(word) for word in words if word not in 
      set(stopwords.words('english'))]
  4     sentence[i]=' '.join(words)

~\anaconda3\lib\site-packages\pandas\core\series.py in __getitem__(self, key)
   869         key = com.apply_if_callable(key, self)
   870         try:
   --> 871     result = self.index.get_value(self, key)
   872 
   873             if not is_scalar(result):

~\anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_value(self, 
  series, key)
  4403         k = self._convert_scalar_indexer(k, kind="getitem")
  4404         try:
  -> 4405             return self._engine.get_value(s, k, 
  tz=getattr(series.dtype, "tz", None))
  4406         except KeyError as e1:
  4407             if len(self) > 0 and (self.holds_integer() or 
  self.is_boolean()):

  pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value()

  pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_value()

  pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()

  pandas\_libs\hashtable_class_helper.pxi in 
  pandas._libs.hashtable.Int64HashTable.get_item()

  pandas\_libs\hashtable_class_helper.pxi in 
  pandas._libs.hashtable.Int64HashTable.get_item()

  KeyError: 34

How can I fix the error?

Also, how can I get the result in my data frame- add another column with the results?

KeyError when cleaning tweets column using stop words in python

Answers (1)

Explanation

Doesn't work

Works

Related Questions