Reputation: 381
I tried to select a single value of column class from each group of my dataframe after i performed the groupby function on the column first_register and second_register but it seems did not work.
Suppose I have a dataframe like this:
import numpy as np
import pandas as pd
df = pd.DataFrame({'class': [1, 1, 1, 2, 2, 2, 0, 0, 1],
'first_register': ["70/20", "70/20", "70/20", "71/20", "71/20", "71/20", np.NAN, np.NAN, np.NAN],
'second_register': [np.NAN, np.NAN, np.NAN, np.NAN, np.NAN, np.NAN, "72/20", "72/20", "73/20"]})
What I have tried and did not work at all:
group_by_df = df.groupby(["first_register", "second_register"])
label_class = group_by_df["class"].unique()
print(label_class)
How can I select/access each single class label from each group of dataframe?
The desired output can be an ordered list like this to represent each class of each group from the first group to the final group:
label_class = [1, 2, 0, 1]
Upvotes: 0
Views: 1056
Reputation: 30940
Use dropna=False
:
group_by_df = df.groupby(["first_register", "second_register"], dropna=False)
label_class = group_by_df["class"].unique()
first_register second_register
70/20 NaN [1]
71/20 NaN [2]
NaN 72/20 [0]
73/20 [1]
Name: class, dtype: object
if you knok length of unique class is 1 or you want get the first or the last:
label_class = group_by_df["class"].first()
Or:
label_class = group_by_df["class"].last()
Upvotes: 2
Reputation: 863761
Use GroupBy.first
:
out = df.groupby(["first_register", "second_register"], dropna=False)["class"].first()
print (out)
first_register second_register
70/20 NaN 1
71/20 NaN 2
NaN 72/20 0
73/20 1
Name: class, dtype: int64
label_class = out.tolist()
print (label_class)
[1, 2, 0, 1]
Upvotes: 1