Reputation: 15
I need some help with Pandas..
I have a Dataframe which I want to group by the ID column (that works so far). The Tags column can contain lists with different amounts Elements and also empty lists.
g = data_lemmatized.groupby('ID')['Tags'].apply(lambda x: list(np.unique(x)))
This is the original dataframe:
With the code I used, I'm recieving the following result:
What I would like to have in the new dataframe is:
-a single list with no sub-lists inside, just with the elements or empty
-no duplicates within the lists (a set of each grouped list)
Example:
0 -> []
1 -> []
2 -> [DTU]
Can someone help me please?
Upvotes: 0
Views: 43
Reputation: 2614
Try this code.
import pandas as pd
data_lemmatized = pd.DataFrame({"ID":[0, 1, 2, 2, 2],
"Tags": [[], [], ['DTU'], [], []]})
data_lemmatized.groupby('ID')['Tags'].sum().apply(set).apply(list)
Here, sum of list returns concatenation of lists.
Upvotes: 1