Reputation: 5188
Referring to this very popular question regarding groupby to dataframe. Unfortunately, I do not think this particular use case is the most useful, here's mine:
Suppose you have what could be a hierarchical dataset in a flattened form, e.g.
key val
0 'a' 2
1 'a' 1
2 'b' 3
3 'b' 4
what I wish to do is convert that dataframe to this structure
'a' 'b'
0 2 3
1 1 4
I thought this would be as simple as pd.DataFrame(df.groupby('key').groups)
but it isn't.
How can I make this transformation?
Upvotes: 2
Views: 3289
Reputation: 7984
Think this should work. Note the example is different from OP's. There are duplicates in the example.
df = pd.DataFrame({'key': {0: "'a'", 1: "'a'", 2: "'b'", 3: "'b'", 4: "'a'"},
'val': {0: 2, 1: 1, 2: 3, 3: 4, 4: 2}})
df_wanted = pd.DataFrame.from_dict(
df.groupby("key")["val"].apply(list).to_dict(), orient='index'
).transpose()
'a' 'b'
0 2.0 3.0
1 1.0 4.0
2 2.0 NaN
df.groupby("key")["val"].apply(list).to_dict()
creates a dictionary {"'a'": [2, 1, 2], "'b'": [3, 4]}
. Then, we transfer the dictionary to a DataFrame object.
We use DataFrame.from_dict
function. Because the dictionary contains different lengths, we need to pass in an extra argument orient='index'
and then do transpose()
in the end.
Reference
Creating dataframe from a dictionary where entries have different lengths
Upvotes: 2
Reputation: 153460
Let's use set_index
and unstack
with cumcount
:
df.set_index([df.groupby('key').cumcount(),'key'])['val']\
.unstack().rename_axis(None,1)
Output:
'a' 'b'
0 2 3
1 1 4
Upvotes: 0
Reputation: 323226
df.assign(index=df.groupby('key').cumcount()).pivot('index','key','val')
Out[369]:
key 'a' 'b'
index
0 2 3
1 1 4
Upvotes: 7
Reputation: 210832
what about the following approach?
In [134]: pd.DataFrame(df.set_index('val').groupby('key').groups)
Out[134]:
a b
0 2 3
1 1 4
Upvotes: 3
Reputation: 5188
I'm new to Pandas but this seems to work:
gb = df.groupby('key')
k = 'val'
pd.DataFrame(
[gb.get_group(x)[k].tolist() for x in gb.groups],
index=[x for x in gb.groups]
).transpose()
Upvotes: 0