Reputation: 101
I am loading a CSV into a pandas data frame. One of the columns in the dataframe is "reviews" which contain strings of text. I need to identify all the adjectives in this column in all the rows of the dataframe and then create a new column "adjectives" that contains a list of all the adjectives from that review.
I've tried using TextBlobs and was able to tag the parts of speech for each case using the code posted.
import pandas as pd
from textblob import TextBlob
df=pd.read_csv('./data.csv')
def pos_tag(text):
try:
return TextBlob(text).tags
except:
return None
df['pos'] = df['reviews'].apply(pos_tag)
df.to_csv('dataadj.csv', index=False)
Upvotes: 5
Views: 6305
Reputation: 1514
You're almost there. TextBlob(text).tags
returns a tuple list of (word, tag). You just need to filter based on the tag (JJ
in your case).
You could do something like this:
def get_adjectives(text):
blob = TextBlob(text)
return [ word for (word,tag) in blob.tags if tag == "JJ"]
df['adjectives'] = df['reviews'].apply(get_adjectives)
EDIT: To also capture ajectives at the comparative / superlative form (JJR/JJS), replacing tag == "JJ"
with tag.startswith("JJ")
should work.
Upvotes: 7