zara kolagar
zara kolagar

Reputation: 901

pandas: instead of applying the function to df get the result as a list from the function

I have a dataframe like the following:

df = pd.DataFrame({ 'text':['the weather is nice though', 'How are you today','the beautiful girl and the nice boy'],
'pos':["['DET', 'NOUN', 'VERB','ADJ', 'ADV']","['QUA', 'VERB', 'PRON', 'ADV']", "['DET', 'ADJ', 'NOUN','CON','DET', 'ADJ', 'NOUN' ]"]})

I have a function that outputs the exact corresponding word and its index for when pos == 'ADJ' like the following: (see here)

import pandas as pd

def extract_words(row):
word_pos = {}
text_splited = row.text.split()
pos = ast.literal_eval(row.pos)
for i, p in enumerate(pos):
    if p == 'ADJ':
        word_pos[text_splited[i]] = i
return word_pos

df['Third_column'] = ' '
df['Third_column'] = df.apply(extract_words, axis=1)

what I would like to do is to refactor the function in a way that I would not have to apply this function to df outside the function, and instead be able to append the result to a list outside the function. So far I have tried this:

list_word_index = []

def extract_words(dataframe):
for li in dataframe.text.str.split():
    for lis in dataframe.pos:
        for i, p in enumerate(ast.literal_eval(lis)):
            if p == 'nk':
                ...
               list_word_index.append(...)

extract_words(df)

I do not know how to fill in the ... part of the code.

Upvotes: 1

Views: 62

Answers (1)

Hagen
Hagen

Reputation: 56

Here's how you could use the function to get a list back, based on your DataFrame:

from typing import List

df = pd.DataFrame({ 'text':['the weather is nice though', 'How are you today','the beautiful girl and the nice boy'],
'pos':[['DET', 'NOUN', 'VERB','ADJ', 'ADV'],['QUA', 'VERB', 'PRON', 'ADV'], ['DET', 'ADJ', 'NOUN','CON','DET', 'ADJ', 'NOUN' ]]})


def extract_words_to_list(df: pd.DataFrame) -> List:
    # iterate over dataframe row-wise
    tmp = []
    for _, row in df.iterrows():
        word_pos = {}
        text_splited = row.text.split()
        for i, p in enumerate(row.pos):
            if p == 'ADJ':
                word_pos[text_splited[i]] = i
        tmp.append(word_pos)
    return tmp

list_word_index = extract_words_to_list(df)
list_word_index # [{'nice': 3}, {}, {'beautiful': 1, 'nice': 5}]

Though you could also just use:

df['Third_column'] = df.apply(extract_words, axis=1)
df['Third_column'].tolist() # [{'nice': 3}, {}, {'beautiful': 1, 'nice': 5}]

To achieve the same thing.

Upvotes: 2

Related Questions