Reputation: 10021
Let's say I have one column in a dataframe which has list of list:
id pos
0 1 [[['Malaysia','NR'], [':','PU'], ['Natural','JJ'], ['selling price','NN']]]
1 2 [[['Spot Price','NN'], [':','PU'], ['cotton','NN'], ['India', ' NR']]]
or in dictionary format:
[{'id': 1,
'pos': "[[['Malaysia','NR'], [':','PU'], ['Natural','JJ'], ['selling price','NN']]]"},
{'id': 2,
'pos': "[[['Spot Price','NN'], [':','PU'], ['cotton','NN'], ['India', ' NR']]]"}]
How could I filter if second element of list is NR
or NN
then split (explode) pos
column row-wisely as follows:
id words part_of_speech
0 1 Malasia NR
1 1 selling price NN
2 2 Spot price NN
3 2 cotton NN
4 2 India NR
How could I acheive this in Python? Thanks.
Trial code:
l = [[['Malaysia','NR'], [':','PU'], ['Natural','JJ'], ['selling price','NN']]]
for elem in l[0]:
print(elem[1])
Out:
NR
PU
JJ
NN
Upvotes: 4
Views: 221
Reputation: 260735
Here is a working solution, it explodes first and filters afterwards, which I believe should be more efficient as it doesn't require looping:
# get rid of unnecessary level of nesting
df['pos'] = df['pos'].str[0]
# explode the list
df = df.explode('pos')
# split the two items to separate columns
df['words'] = df['pos'].str[0]
df['part_of_speech'] = df['pos'].str[1]
# filter output
df.drop('pos', axis=1)[df['part_of_speech'].isin(['NR', 'NN'])]
Output:
id words part_of_speech
0 1 Malaysia NR
0 1 selling price NN
1 2 Spot Price NN
1 2 cotton NN
Upvotes: 1
Reputation: 71580
You could try this with explode
:
x = df.explode('pos').explode('pos')
x = x[['id']].reset_index(drop=True).join(pd.DataFrame(x['pos'].tolist()).set_axis(['words', 'part_of_speech'], axis=1))
x.loc[x['part_of_speech'].isin(['NN', 'NR'])]
id words part_of_speech
0 1 Malaysia NR
3 1 selling price NN
4 2 Spot Price NN
6 2 cotton NN
7 2 India NR
>>>
This is solution could be scaled easily for dataframes with arbitrary length, it doesn't assign columns one by one, it assigns columns at once. So it would work for arbitrary length sublists.
Upvotes: 3