Tonino Fernandez
Tonino Fernandez

Reputation: 551

when grouping dataframe join the values that are different for a certain column

I'm first going to show with an example what I mean:

Let's suppose that I have this dataframe:

enter image description here

If I group the dataframe without the penultimate column (column_2), I want to end up with this:

enter image description here

And, if I have a dataframe where column_1 and last_column have the same values. I don't need to "join" or "append" the values in the "column_2", I just want an empty dataframe.

Does that make sense what I mean? What I've just have is this:

import pandas as pd

data = {'column_1': ['no', 'no', 'no', 'no'], 'column_2': ['spain', 'france', 'italy', 'germany'], "last_column": ['A', 'A', 'A', 'B']}

df = pd.DataFrame.from_dict(data)

aux = df.drop(columns = ['column_2'])
indices_to_keep = aux.groupby(aux.columns.to_list()).filter(lambda x : len(x)<2).index
df_to_keep = df.filter(items = indices_to_keep.to_list(), axis = 0)

My problem with this code, is that I don't know how to join the values on a single row when the df is being grouped.

Upvotes: 0

Views: 33

Answers (1)

Matt
Matt

Reputation: 1284

I think you can just aggregate into a list:

df.groupby(
    ['column_1', 'last_column']
)['column_2'].agg(list).reset_index()[
    ['column_1', 'column_2', 'last_column']
]

Upvotes: 1

Related Questions