Parseltongue
Parseltongue

Reputation: 11657

Apply function to rows, unpack dictionary into multiple columns

I am using NLTK's SentimentIntensityAnalyzer() on corpuses stored in columns in pandas. Using .polarity_scores() returns a dictionary of four keys and their values, neg, neu, pos, and compound.

I want to iterate over every row in the dataframe, compute a polarity score on a corpus contained in joined_corpus['body], and unpack the resulting dictionary into fours columns in the dataframe. I couldn't figure out a way to unpack multiple key:value pairs into a column in pandas, so I had to use the following for loop:

for index, row in joined_corpus.iterrows():
    sentiment = sid.polarity_scores(row['body'])
    joined_corpus.loc[index, 'neg'] = sentiment['neg']
    joined_corpus.loc[index, 'neu'] = sentiment['neu']
    joined_corpus.loc[index, 'pos'] = sentiment['pos']
    joined_corpus.loc[index, 'compound'] = sentiment['pos']
    print("sentiment calculated for "+ row['subreddit'] + "of" + str(sentiment))

This produces output like so:

sentiment calculated for 1200isplentyof{'neg': 0.067, 'neu': 0.745, 'pos': 0.188, 'compound': 1.0}
sentiment calculated for 2007scapeof{'neg': 0.092, 'neu': 0.77, 'pos': 0.138, 'compound': 0.9998}
sentiment calculated for 2b2tof{'neg': 0.123, 'neu': 0.768, 'pos': 0.109, 'compound': -0.9981}
sentiment calculated for 2healthbarsof{'neg': 0.096, 'neu': 0.762, 'pos': 0.142, 'compound': 0.9994}
sentiment calculated for 2meirl4meirlof{'neg': 0.12, 'neu': 0.709, 'pos': 0.171, 'compound': 0.9997}
sentiment calculated for 3DSof{'neg': 0.054, 'neu': 0.745, 'pos': 0.201, 'compound': 1.0}
sentiment calculated for 3Dprintingof{'neg': 0.056, 'neu': 0.812, 'pos': 0.131, 'compound': 1.0}
sentiment calculated for 3dshacksof{'neg': 0.055, 'neu': 0.804, 'pos': 0.141, 'compound': 1.0}
sentiment calculated for 40kLoreof{'neg': 0.123, 'neu': 0.747, 'pos': 0.13, 'compound': 0.9545}
sentiment calculated for 49ersof{'neg': 0.098, 'neu': 0.715, 'pos': 0.187, 'compound': 1.0}

Obviously, however, this is slow because it doesn't use pandas built-in apply command. Is there a way to avoid for loops in this case?

Upvotes: 1

Views: 695

Answers (2)

BENY
BENY

Reputation: 323266

By using apply

sentiment = df['body'].apply(lambda x : sid.polarity_scores(x))
df=pd.concat([df,sentiment.apply(pd.Series)],1)

Then,

"sentiment calculated for "+df['subreddit']+'of'+ sentiment.astype(str)

Upvotes: 1

jpp
jpp

Reputation: 164673

You can use a list comprehension for this:

res = [sid.polarity_scores(x) for x in df['body']]

for item in res:
    print(res)

You can also create a series directly from this list:

df['sentiment'] = [sid.polarity_scores(x) for x in df['body']]

Upvotes: 1

Related Questions