Erwin
Erwin

Reputation: 381

select a single value from a column after groupby another columns in python

I tried to select a single value of column class from each group of my dataframe after i performed the groupby function on the column first_register and second_register but it seems did not work.

Suppose I have a dataframe like this:

import numpy as np
import pandas as pd
df = pd.DataFrame({'class': [1, 1, 1, 2, 2, 2, 0, 0, 1],
                   'first_register': ["70/20", "70/20", "70/20", "71/20", "71/20", "71/20", np.NAN, np.NAN, np.NAN],
                   'second_register': [np.NAN, np.NAN, np.NAN, np.NAN, np.NAN, np.NAN, "72/20", "72/20", "73/20"]})

What I have tried and did not work at all:

group_by_df = df.groupby(["first_register", "second_register"])
label_class = group_by_df["class"].unique()
print(label_class)

How can I select/access each single class label from each group of dataframe?

The desired output can be an ordered list like this to represent each class of each group from the first group to the final group:

label_class = [1, 2, 0, 1]

Upvotes: 0

Views: 1056

Answers (2)

ansev
ansev

Reputation: 30940

Use dropna=False:

group_by_df = df.groupby(["first_register", "second_register"], dropna=False)
label_class = group_by_df["class"].unique()


first_register  second_register
70/20           NaN                [1]
71/20           NaN                [2]
NaN             72/20              [0]
                73/20              [1]
Name: class, dtype: object

if you knok length of unique class is 1 or you want get the first or the last:

label_class = group_by_df["class"].first()

Or:

label_class = group_by_df["class"].last()

Upvotes: 2

jezrael
jezrael

Reputation: 863761

Use GroupBy.first:

out = df.groupby(["first_register", "second_register"], dropna=False)["class"].first()
print (out)

first_register  second_register
70/20           NaN                1
71/20           NaN                2
NaN             72/20              0
                73/20              1
Name: class, dtype: int64


label_class = out.tolist()
print (label_class)
[1, 2, 0, 1]

Upvotes: 1

Related Questions