Mike
Mike

Reputation: 385

I cannot use groupby with a dataframe based on lists

I have a dataframe df of three columns "id", "nodes_set", "description" where "nodes_set" is a list of of strings.

I am trying to split it into groups based on their values of sequences as follows:

df_by_nodes_set = df.groupby('nodes_set')
list(df_by_nodes_set)

I think the problem lies in the fact that I am trying to use groupby with lists, but I am not sure how to deal with that.

Upvotes: 0

Views: 301

Answers (1)

jedi
jedi

Reputation: 585

The question is unclear, but if you need to group by a list, then that list can be converted into a hash or simply concatenate the elements to get an id, like below:

df = pd.DataFrame([[i, list(range(i)),'sample ' + str(i)] for i in range(5)] , columns = ["id", "nodes_set", "description"])

nodes_set_key = df['nodes_set'].apply(lambda x: '_'.join(map(str, x)))
df.groupby(nodes_set_key).last()

Here is the code output:

enter image description here

Upvotes: 1

Related Questions