Filter list of list column then split (explode) row-wisely in Python

Question

Let's say I have one column in a dataframe which has list of list:

   id                                                pos
0   1  [[['Malaysia','NR'], [':','PU'], ['Natural','JJ'], ['selling price','NN']]]
1   2  [[['Spot Price','NN'], [':','PU'], ['cotton','NN'], ['India', ' NR']]]

or in dictionary format:

[{'id': 1,
  'pos': "[[['Malaysia','NR'], [':','PU'], ['Natural','JJ'], ['selling price','NN']]]"},
 {'id': 2,
  'pos': "[[['Spot Price','NN'], [':','PU'], ['cotton','NN'], ['India', ' NR']]]"}]

How could I filter if second element of list is NR or NN then split (explode) pos column row-wisely as follows:

   id          words part_of_speech
0   1        Malasia             NR
1   1  selling price             NN
2   2     Spot price             NN
3   2         cotton             NN
4   2          India             NR

How could I acheive this in Python? Thanks.

Trial code:

l = [[['Malaysia','NR'], [':','PU'], ['Natural','JJ'], ['selling price','NN']]]
for elem in l[0]:
    print(elem[1])

Out:

NR
PU
JJ
NN

U13-Forward · Accepted Answer

You could try this with explode:

x = df.explode('pos').explode('pos')
x = x[['id']].reset_index(drop=True).join(pd.DataFrame(x['pos'].tolist()).set_axis(['words', 'part_of_speech'], axis=1))
x.loc[x['part_of_speech'].isin(['NN', 'NR'])]

   id          words part_of_speech
0   1       Malaysia             NR
3   1  selling price             NN
4   2     Spot Price             NN
6   2         cotton             NN
7   2          India             NR
>>>

This is solution could be scaled easily for dataframes with arbitrary length, it doesn't assign columns one by one, it assigns columns at once. So it would work for arbitrary length sublists.

Filter list of list column then split (explode) row-wisely in Python

Answers (2)

Related Questions