youngguv
youngguv

Reputation: 101

How to extract all adjectives from a strings of text in a pandas dataframe?

I am loading a CSV into a pandas data frame. One of the columns in the dataframe is "reviews" which contain strings of text. I need to identify all the adjectives in this column in all the rows of the dataframe and then create a new column "adjectives" that contains a list of all the adjectives from that review.

I've tried using TextBlobs and was able to tag the parts of speech for each case using the code posted.

import pandas as pd
from textblob import TextBlob

df=pd.read_csv('./data.csv')

def pos_tag(text):
    try:
        return TextBlob(text).tags
    except:
        return None

df['pos'] = df['reviews'].apply(pos_tag)

df.to_csv('dataadj.csv', index=False)

Upvotes: 5

Views: 6305

Answers (1)

Nakor
Nakor

Reputation: 1514

You're almost there. TextBlob(text).tags returns a tuple list of (word, tag). You just need to filter based on the tag (JJ in your case).

You could do something like this:

def get_adjectives(text):
    blob = TextBlob(text)
    return [ word for (word,tag) in blob.tags if tag == "JJ"]

df['adjectives'] = df['reviews'].apply(get_adjectives)

EDIT: To also capture ajectives at the comparative / superlative form (JJR/JJS), replacing tag == "JJ" with tag.startswith("JJ") should work.

Upvotes: 7

Related Questions