ambrish dhaka
ambrish dhaka

Reputation: 749

Twitter data can't be parsed from a text file

I am collecting twitter data in txt file via streaming and I use this file for filtering and various queries using ipython notebook. I find that sometimes when I have a heavy data file the command gets stuck somewhere around 'text' a category in twitter data. I need the way around to handle the data so that I am not stuck. I am pasting below what happens.

tweets_ISIS = tweets['text'].apply(lambda tweet: word_in_text('ISIS', tweets))

Here is the output:

AttributeError                            Traceback (most recent call   last)
 <ipython-input-34-444b712d99dc> in <module>()
 ----> 1 tweets_ISIS = tweets['text'].apply(lambda tweet: word_in_text('ISIS', tweets))

/usr/lib64/python2.7/site-packages/pandas/core/series.pyc in apply(self, func, convert_dtype, args, **kwds)
    2167             values = lib.map_infer(values, lib.Timestamp)
    2168 
  -> 2169         mapped = lib.map_infer(values, f, convert=convert_dtype)
      2170         if len(mapped) and isinstance(mapped[0], Series):
      2171             from pandas.core.frame import DataFrame

   pandas/src/inference.pyx in pandas.lib.map_infer (pandas/lib.c:62578)()

     <ipython-input-34-444b712d99dc> in <lambda>(tweet)
          ----> 1 tweets_ISIS = tweets['text'].apply(lambda tweet: word_in_text('ISIS', tweets))

      <ipython-input-33-0ee00dabf341> in word_in_text(word, text)
         1 def word_in_text(word, text):
       2     word = word.lower()
   ----> 3     text = text.lower()
      4     match = re.search(word, text)
        5     if match:

    /usr/lib64/python2.7/site-packages/pandas/core/generic.pyc in            __getattr__(self, name)
     2358                 return self[name]
    2359             raise AttributeError("'%s' object has no attribute '%s'" %
 -> 2360                                  (type(self).__name__, name))
    2361 
     2362     def __setattr__(self, name, value):

   AttributeError: 'DataFrame' object has no attribute 'lower

I defined as follows import re:

    def word_in_text(word, text):
        word = word.lower()
          text = text.lower()
         match = re.search(word, text)
         if match:
            return True
             return False

Upvotes: 0

Views: 114

Answers (2)

ambrish dhaka
ambrish dhaka

Reputation: 749

I defined as follows import re

    def word_in_text(word, text):
        word = word.lower()
          text = text.lower()
         match = re.search(word, text)
         if match:
            return True
             return False

Upvotes: 0

SPKoder
SPKoder

Reputation: 1893

You are using tweets when I think your intention was to use tweet. You are passing the DataFrame to word_in_text() rather than passing the input of the lambda function to word_in_text(). Try:

tweets_ISIS = tweets['text'].apply(lambda tweet: word_in_text('ISIS', tweet))

Also, is apply() the right function to use here? Based on the limited context, it seems that map() might the be proper choice to run word_in_text() on each value in the text Series, but I can't tell for sure without something more complete and reproducible.

Upvotes: 1

Related Questions