Reputation: 3212
When using the unique()
method on a Series you get a numpy
array as a result, this also happens when doing it on a groupby. Consider this example:
import pandas as pd
L0 = ['G','i','G','h','j','h','G','j']
L1 = ['A','A','B','B','B','B','B','B']
df = pd.DataFrame({"A":L0,"B":L1})
dg = df.groupby('B').A.unique()
Resulting in this:
Out[56]:
B
A [G, i]
B [G, h, j]
Name: A, dtype: object
I want each unique element in its own row though:
A
B
A G
A i
B G
B h
B j
I can achieve this by hand like this (I'm deliberately omitting any iteration over DataFrames and only use the underlying numpy
arrays):
de = pd.DataFrame(columns=["A","B"])
for i in range(dg.index.nunique()):
ds = pd.Series(dg.values[i]).to_frame()
ds.columns = ["A"]
ds["B"] = dg.index.values[i]
de = de.append(ds)
de = de.set_index('B')
But I'm wondering if there is a shorter (and fast) way that doesn't need loops, creating new Series or DataFrames, or messing around with the numpy
arrays.
If not, I might propose it as a feature.
Upvotes: 1
Views: 59
Reputation: 863166
You can use apply
with Series
:
dg = df.groupby('B').A
.apply(lambda x: pd.Series(x.unique()))
.reset_index(level=1, drop=True)
.to_frame()
print (dg)
A
B
A G
A i
B G
B h
B j
Another possible solution is drop_duplicates
:
df = df.drop_duplicates(['A','B']).set_index('B')
print (df)
A
B
A G
A i
B G
B h
B j
Upvotes: 1