Group by - Dataframe with lists

Question

I need some help with Pandas..

I have a Dataframe which I want to group by the ID column (that works so far). The Tags column can contain lists with different amounts Elements and also empty lists.

g = data_lemmatized.groupby('ID')['Tags'].apply(lambda x: list(np.unique(x)))

This is the original dataframe:

With the code I used, I'm recieving the following result:

What I would like to have in the new dataframe is:

-a single list with no sub-lists inside, just with the elements or empty

-no duplicates within the lists (a set of each grouped list)

Example:

0 -> []
1 -> []
2 -> [DTU]

Can someone help me please?

Gilseung Ahn · Accepted Answer

Try this code.

import pandas as pd
data_lemmatized = pd.DataFrame({"ID":[0, 1, 2, 2, 2],
                                "Tags": [[], [], ['DTU'], [], []]})

data_lemmatized.groupby('ID')['Tags'].sum().apply(set).apply(list)

Here, sum of list returns concatenation of lists.

Group by - Dataframe with lists

Answers (1)

Related Questions