Fred
Fred

Reputation: 93

Pandas - Groupby dataframe store as dataframe without aggregating

I'm new to Pandas and I've read a lot of documentation, posts and answers here but I've been unable to discern a good strategy to approach my goal, sorry if its already answered, I couldn't find it. Here is what I have:

df = {'key': ['A', 'B', 'A', 'B'], 'value': [2,2,1,1]}
df = pd.DataFrame(df)
df
   key  value
0   A   2
1   B   2
2   A   1
3   B   1

I know that doing groupby() would return a groupby object, and I know that I can do a lot of aggregating stuff (count, size, mean, etc) using the groupby object. However, I don't want to aggregate, I just want to groupby my dataframe based on 'key' column and store it as a dataframe like the following:

   key  value
0   A   2
1   A   1
2   B   2
3   B   1

Once I get this step done, what I eventually want is to order each group by value like the following:

   key  value
0   A   1
1   A   2
2   B   1
3   B   2

Any answer, comment, or hint is greatly appreciated. Thanks!

Upvotes: 4

Views: 23105

Answers (4)

hello_km_world
hello_km_world

Reputation: 1

def regroup_key_then_sort_value(df):
    DFS=[]
    for name, grp in df.groupby('key',sort=True):
        DFS.append(grp.sort_values('value'))
    
    return pd.concat(DFS).reset_index(drop=True)

df = pd.DataFrame({'key': ['A', 'B', 'A', 'B'], 'value': [2,2,1,1]})

output = regroup_key_then_sort_value(df)
print (output)

Output:

key  value
0   A      1
1   A      2
2   B      1
3   B      2

Upvotes: 0

Santiago
Santiago

Reputation: 236

If the reason you want to use groupby is to preserve the index structure then you can do the following:

df = {'key': ['A', 'B', 'A', 'B'], 'value': [2,2,1,1]}
df = pd.DataFrame(df)
print(df) 

key  value
0   A      2
1   B      2
2   A      1
3   B      1

So, first create the index:

df.set_index(['key'], inplace=True)
print(df)

     value
key       
A        2
B        2
A        1
B        1

Then, sort the index:

df.sort_index(inplace=True)
print(df)

     value
key       
A        2
A        1
B        2
B        1

Then, sort the values:

df.sort_values('value',inplace=True)
print(df)

     value
key       
A        1
B        1
A        2
B        2

And if you want to preserve the original index, finally do:

df.reset_index(inplace=True)
print(df)

  key  value
0   A      1
1   B      1
2   A      2
3   B      2

Upvotes: 2

CoreDump
CoreDump

Reputation: 831

If you are willing to do it without using chaining then this should work...

df = {'key': ['A', 'B', 'A', 'B'], 'value': [2,2,1,1]}
df = pd.DataFrame(df)

groups = df.groupby(['key', 'value'])
groups = sorted(groups)
df = pd.concat([g for _, g in groups])

print(df)

For the record, I don't fully understand why you wouldn't sort the entire frame... I am guessing that you need groups for other transformations besides sorting anyway, and so you want to save yourself from having to sort the entire frame. If you found better performance by doing this then please let me know :)

Upvotes: 1

root
root

Reputation: 33793

You can get your desired output by sorting your dataframe with sort_values instead of doing a groupby.

df.sort_values(['key', 'value'], inplace=True)

Edit:

If you really want to use groupby to perform the grouping of the keys, so could apply a trivial filter to the groupby object.

df = df.groupby('key').filter(lambda x: True)

This doesn't seem like the best way to get a dataframe back, but nothing else immediately comes to mind. Afterwards you'd still need to use sort_values to order the values column.

Upvotes: 11

Related Questions