Khris
Khris

Reputation: 3212

Pandas: Get unique items from a groupby into separate rows instead of arrays

When using the unique() method on a Series you get a numpy array as a result, this also happens when doing it on a groupby. Consider this example:

import pandas as pd
L0 = ['G','i','G','h','j','h','G','j']
L1 = ['A','A','B','B','B','B','B','B']

df = pd.DataFrame({"A":L0,"B":L1})
dg = df.groupby('B').A.unique()

Resulting in this:

Out[56]: 
B
A       [G, i]
B    [G, h, j]
Name: A, dtype: object

I want each unique element in its own row though:

   A
B   
A  G
A  i
B  G
B  h
B  j

I can achieve this by hand like this (I'm deliberately omitting any iteration over DataFrames and only use the underlying numpy arrays):

de = pd.DataFrame(columns=["A","B"])
for i in range(dg.index.nunique()):
    ds = pd.Series(dg.values[i]).to_frame()
    ds.columns = ["A"]
    ds["B"] = dg.index.values[i]
    de = de.append(ds)
de = de.set_index('B')

But I'm wondering if there is a shorter (and fast) way that doesn't need loops, creating new Series or DataFrames, or messing around with the numpy arrays.

If not, I might propose it as a feature.

Upvotes: 1

Views: 59

Answers (1)

jezrael
jezrael

Reputation: 863166

You can use apply with Series:

dg = df.groupby('B').A
       .apply(lambda x: pd.Series(x.unique()))
       .reset_index(level=1, drop=True)
       .to_frame()
print (dg)
   A
B   
A  G
A  i
B  G
B  h
B  j

Another possible solution is drop_duplicates:

df = df.drop_duplicates(['A','B']).set_index('B')
print (df)
   A
B   
A  G
A  i
B  G
B  h
B  j

Upvotes: 1

Related Questions