Debbie
Debbie

Reputation: 969

Python + dataframe : AttributeError: 'float' object has no attribute 'replace'

I am trying to write a function to do some text processing on the specified columns (description, event_name) of a Pandas dataframe. I wrote this code:

#removal of unreadable chars, unwanted spaces, words of at most length two from 'description' column and lowercase the 'description' column

def data_preprocessing(source):

    return source.replace('[^A-Za-z]',' ')
    #data['description'] = data['description'].str.replace('\W+',' ')
    return source.lower()
    return source.replace("\s\s+" , " ")
    return source.replace('\s+[a-z]{1,2}(?!\S)',' ')
    return source.replace("\s\s+" , " ")

data['description'] = data['description'].apply(lambda row: data_preprocessing(row))
data['event_name'] = data['event_name'].apply(lambda row: data_preprocessing(row))

It is giving the following error:

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-94-cb5ec147833f> in <module>()
----> 1 data['description'] = data['description'].apply(lambda row: data_preprocessing(row))
      2 data['event_name'] = data['event_name'].apply(lambda row: data_preprocessing(row))
      3 
      4 #df['words']=df['words'].apply(lambda row: eliminate_space(row))
      5 

~/anaconda3/envs/tensorflow/lib/python3.5/site-packages/pandas/core/series.py in apply(self, func, convert_dtype, args, **kwds)
   2549             else:
   2550                 values = self.asobject
-> 2551                 mapped = lib.map_infer(values, f, convert=convert_dtype)
   2552 
   2553         if len(mapped) and isinstance(mapped[0], Series):

pandas/_libs/src/inference.pyx in pandas._libs.lib.map_infer()

<ipython-input-94-cb5ec147833f> in <lambda>(row)
----> 1 data['description'] = data['description'].apply(lambda row: data_preprocessing(row))
      2 data['event_name'] = data['event_name'].apply(lambda row: data_preprocessing(row))
      data['description'] = data['description'].str.replace('\W+',' ')    
<ipython-input-93-fdfec5f52a06> in data_preprocessing(source)
      3 def data_preprocessing(source):
      4 
----> 5     return source.replace('[^A-Za-z]',' ')
      6     #data['description'] = data['description'].str.replace('\W+',' ')
      7     source = source.lower()

AttributeError: 'float' object has no attribute 'replace'

If I write the code in following way, without function, it works perfectly:

data['description'] = data['description'].str.replace('[^A-Za-z]',' ')

Upvotes: 0

Views: 11834

Answers (1)

Peter Leimbigler
Peter Leimbigler

Reputation: 11105

Two things to fix:

First, when you apply a lambda function to a pandas Series, the lambda function is applied to each element of the Series. What I think you need is to apply your function to the entire Series in a vectorized manner.

Second, your function has multiple return statements. As a result, only the first statement, return source.replace('[^A-Za-z]',' '), will ever run. What you need to do is make your preprocessing changes on the variable source inside your function, and finally return the modified source (or an intermediate variable) at the very end.

To rewrite your function to operate on an entire pandas Series, replace every occurrence of source. with source.str.. The new function definition:

def data_preprocessing(source):
    source = source.str.replace('[^A-Za-z]',' ')
    #data['description'] = data['description'].str.replace('\W+',' ')
    source = source.str.lower()
    source = source.str.replace("\s\s+" , " ")
    source = source.str.replace('\s+[a-z]{1,2}(?!\S)',' ')
    source = source.str.replace("\s\s+" , " ")
    return source

Then, instead of this:

data['description'] = data['description'].apply(lambda row: data_preprocessing(row))
data['event_name'] = data['event_name'].apply(lambda row: data_preprocessing(row))

Try this:

data['description'] = data_preprocessing(data['description'])
data['event_name'] = data_preprocessing(data['event_name'])

Upvotes: 5

Related Questions