deLaJU
deLaJU

Reputation: 45

AttritubeError: list' object has no attribute 'map' when using .apply() to dataFrame

This is how my dataframe called "emails" looks like (only one row with columns 'text' and 'POS_Tag'):

print(emails)

enter image description here

I'm trying to use apply() on my dataframe by first defining the function as:

 def extractGrammar(email):     
    tag_count_data = pd.DataFrame(email['POS_Tag'].map(lambda x: Counter(tag[1] for tag in x)).to_list())

    # Print count Part of speech tag needed for Adjective, Adverbs, Nouns and Verbs 
    email = pd.concat([email, tag_count_data], axis=1).fillna(0)

    pos_columns = ['PRP','MD','JJ','JJR','JJS','RB','RBR','RBS', 'NN', 'NNS','VB', 'VBS', 'VBG','VBN','VBP','VBZ']
    for pos in pos_columns:
        if pos not in email.columns:
            email[pos] = 0

    email = email[['text'] + pos_columns]

    email['Adjectives'] = email['JJ'] + email['JJR'] + email['JJS']
    email['Adverbs'] = email['RB'] + email['RBR'] + email['RBS']
    email['Nouns'] = email['NN'] + email['NNS']
    email['Verbs'] = email['VB']  + email['VBS'] + email['VBG']  + email['VBN'] + email['VBP'] + email['VBZ'] 

    return email

And I have tried to pass my emails as an object with the apply() function as such:

emails = emails.apply(extractGrammar, axis=1)

I have just been getting this error:

AttributeError: 'list' object has no attribute 'map'

I have previously used the exact same block of code within the 'extractGrammar' function on CSV files with multiple rows of emails except it was used in a very manual and chronological way outside of a function where no apply was used. I cannot figure out what seemed to have gone wrong.

enter image description here

Upvotes: 0

Views: 783

Answers (2)

deLaJU
deLaJU

Reputation: 45

In order to the df with the tags that I'd posted on the question and based on the kind guidance of LiamFiddler, I later on proceeded with:

  1. Turning Counter objects into a dict using dict()
  2. I turned dict into a Series,
  3. I set column values to be the column names based on this answer
  4. and then went on to select the tags that I need for my dataDrame.
def extractGrammar(email): 
   # Updated calculate the tags I need 
   tag_count_data = Counter([x[1] for x in email['POS_Tag']])
  
   #Convert the Counter object to dict
   tag_count_dict = dict(tag_count_data)

   #Turning dict into Series
   email_tag = pd.DataFrame(pd.Series(tag_count_dict).fillna(0).rename_axis('Tag'))
   email_tag = email_tag.reset_index()

   #use set_index to set Tag column values to be column names
   email_tag= email_tag.set_index("Tag").T.reset_index(drop=True).rename_axis(None, axis=1) 
   
   #select Tags that I need
   pos_columns = ['PRP','MD','JJ','JJR','JJS','RB','RBR','RBS', 'NN', 'NNS','VB', 'VBS', 'VBG','VBN','VBP','VBZ']
   for pos in pos_columns:
     if pos not in email_tag.columns:
       email_tag[pos] = 0

   email_tag = email_tag[pos_columns] 

   return email_tag

Upvotes: 0

whege
whege

Reputation: 1441

You get that result because when you apply() the extractGrammar() function to your DataFrame, it passes each row of the DataFrame to the function. Then when you access the ['POS Tag'] column, it is not returning that entire Series, but rather the contents of that POS Tag cell for that row, which is a list. Lists do not have a map method. If you are trying to count the occurrences of the second element of each tuple in the POS Tag column, you could try the following:

tag_count_data = Counter([x[1] for x in email['POS Tag']])

This will give you a Counter of the second elements of the tags for that individual row.

Upvotes: 1

Related Questions