Reputation: 45
This is how my dataframe called "emails" looks like (only one row with columns 'text' and 'POS_Tag'):
print(emails)
I'm trying to use apply()
on my dataframe by first defining the function as:
def extractGrammar(email):
tag_count_data = pd.DataFrame(email['POS_Tag'].map(lambda x: Counter(tag[1] for tag in x)).to_list())
# Print count Part of speech tag needed for Adjective, Adverbs, Nouns and Verbs
email = pd.concat([email, tag_count_data], axis=1).fillna(0)
pos_columns = ['PRP','MD','JJ','JJR','JJS','RB','RBR','RBS', 'NN', 'NNS','VB', 'VBS', 'VBG','VBN','VBP','VBZ']
for pos in pos_columns:
if pos not in email.columns:
email[pos] = 0
email = email[['text'] + pos_columns]
email['Adjectives'] = email['JJ'] + email['JJR'] + email['JJS']
email['Adverbs'] = email['RB'] + email['RBR'] + email['RBS']
email['Nouns'] = email['NN'] + email['NNS']
email['Verbs'] = email['VB'] + email['VBS'] + email['VBG'] + email['VBN'] + email['VBP'] + email['VBZ']
return email
And I have tried to pass my emails as an object with the apply()
function as such:
emails = emails.apply(extractGrammar, axis=1)
I have just been getting this error:
AttributeError: 'list' object has no attribute 'map'
I have previously used the exact same block of code within the 'extractGrammar' function on CSV files with multiple rows of emails except it was used in a very manual and chronological way outside of a function where no apply was used. I cannot figure out what seemed to have gone wrong.
Upvotes: 0
Views: 783
Reputation: 45
In order to the df with the tags that I'd posted on the question and based on the kind guidance of LiamFiddler, I later on proceeded with:
def extractGrammar(email):
# Updated calculate the tags I need
tag_count_data = Counter([x[1] for x in email['POS_Tag']])
#Convert the Counter object to dict
tag_count_dict = dict(tag_count_data)
#Turning dict into Series
email_tag = pd.DataFrame(pd.Series(tag_count_dict).fillna(0).rename_axis('Tag'))
email_tag = email_tag.reset_index()
#use set_index to set Tag column values to be column names
email_tag= email_tag.set_index("Tag").T.reset_index(drop=True).rename_axis(None, axis=1)
#select Tags that I need
pos_columns = ['PRP','MD','JJ','JJR','JJS','RB','RBR','RBS', 'NN', 'NNS','VB', 'VBS', 'VBG','VBN','VBP','VBZ']
for pos in pos_columns:
if pos not in email_tag.columns:
email_tag[pos] = 0
email_tag = email_tag[pos_columns]
return email_tag
Upvotes: 0
Reputation: 1441
You get that result because when you apply()
the extractGrammar()
function to your DataFrame, it passes each row of the DataFrame to the function. Then when you access the ['POS Tag']
column, it is not returning that entire Series, but rather the contents of that POS Tag
cell for that row, which is a list. Lists do not have a map
method. If you are trying to count the occurrences of the second element of each tuple in the POS Tag
column, you could try the following:
tag_count_data = Counter([x[1] for x in email['POS Tag']])
This will give you a Counter of the second elements of the tags for that individual row.
Upvotes: 1