John Laudun
John Laudun

Reputation: 407

python pandas: Generate (three) cells from one cell

I have a simple dataframe consisting of some metadata in a few columns and then a column with a sentence in it. I would like to use textacy's SVO extractor to generate three new columns, one each for the subject, verb, and object. I am trying to do this in as pandas a way as possible:

metadata   sentence
1-0        Thank you so much, Chris.
1-1        And it's truly a great honor to be here. 
1-2        I have been blown away by this conference.
1-3        And I say that sincerely.

To which I tried this:

def svo(text):
    svotriple = textacy.extract.triples.subject_verb_object_triples(nlp(text))
    for item in svotriple:
        df['subject'] = str(item[0][-1])
        df['verb']    = str(item[1][-1])
        df['object']  = str(item[2])

df.apply(svo(df['sentence'].values[0]))

I've tried to get just the sentence as a string out of the sentence column a couple of ways. Most of them returned the fact that I was actually getting a series. I want this to work row-by-row. My impulse was to go with a for loop, but I really want to try to do this the pandas way. (Not that my for loops were working terribly well.)

Upvotes: 0

Views: 53

Answers (1)

Nadir Belhaj
Nadir Belhaj

Reputation: 12073

The way you use apply is incorrect. You should create an empty DataFrame to store the SVO triples, you're directly updating the columns of the existing DataFrame in each iteration, which will overwrite the previous values.

Try this way

import pandas as pd
import textacy
import spacy

nlp = spacy.load('en_core_web_sm')

def svo(text):
    svotriples = textacy.extract.triples.subject_verb_object_triples(nlp(text))
    svo_list = []
    for item in svotriples:
        subject = str(item[0][-1])
        verb = str(item[1][-1])
        obj = str(item[2])
        svo_list.append([subject, verb, obj])
    return svo_list

data = {
    'sentence': [
        'Thank you so much, Chris.',
        "And it's truly a great honor to be here.",
        'I have been blown away by this conference.',
        'And I say that sincerely.'
    ]
}

df = pd.DataFrame(data)

df[['subject', 'verb', 'object']] = df['sentence'].apply(svo).apply(pd.Series)

print(df)

Upvotes: 1

Related Questions