Reputation: 901
I have a dataframe like the following:
df = pd.DataFrame({ 'text':['the weather is nice though', 'How are you today','the beautiful girl and the nice boy'],
'pos':["['DET', 'NOUN', 'VERB','ADJ', 'ADV']","['QUA', 'VERB', 'PRON', 'ADV']", "['DET', 'ADJ', 'NOUN','CON','DET', 'ADJ', 'NOUN' ]"]})
I have a function that outputs the exact corresponding word and its index for when pos == 'ADJ'
like the following: (see here)
import pandas as pd
def extract_words(row):
word_pos = {}
text_splited = row.text.split()
pos = ast.literal_eval(row.pos)
for i, p in enumerate(pos):
if p == 'ADJ':
word_pos[text_splited[i]] = i
return word_pos
df['Third_column'] = ' '
df['Third_column'] = df.apply(extract_words, axis=1)
what I would like to do is to refactor the function in a way that I would not have to apply this function to df outside the function, and instead be able to append the result to a list outside the function. So far I have tried this:
list_word_index = []
def extract_words(dataframe):
for li in dataframe.text.str.split():
for lis in dataframe.pos:
for i, p in enumerate(ast.literal_eval(lis)):
if p == 'nk':
...
list_word_index.append(...)
extract_words(df)
I do not know how to fill in the ...
part of the code.
Upvotes: 1
Views: 62
Reputation: 56
Here's how you could use the function to get a list back, based on your DataFrame:
from typing import List
df = pd.DataFrame({ 'text':['the weather is nice though', 'How are you today','the beautiful girl and the nice boy'],
'pos':[['DET', 'NOUN', 'VERB','ADJ', 'ADV'],['QUA', 'VERB', 'PRON', 'ADV'], ['DET', 'ADJ', 'NOUN','CON','DET', 'ADJ', 'NOUN' ]]})
def extract_words_to_list(df: pd.DataFrame) -> List:
# iterate over dataframe row-wise
tmp = []
for _, row in df.iterrows():
word_pos = {}
text_splited = row.text.split()
for i, p in enumerate(row.pos):
if p == 'ADJ':
word_pos[text_splited[i]] = i
tmp.append(word_pos)
return tmp
list_word_index = extract_words_to_list(df)
list_word_index # [{'nice': 3}, {}, {'beautiful': 1, 'nice': 5}]
Though you could also just use:
df['Third_column'] = df.apply(extract_words, axis=1)
df['Third_column'].tolist() # [{'nice': 3}, {}, {'beautiful': 1, 'nice': 5}]
To achieve the same thing.
Upvotes: 2