Reputation: 93
I'm new to Pandas and I've read a lot of documentation, posts and answers here but I've been unable to discern a good strategy to approach my goal, sorry if its already answered, I couldn't find it. Here is what I have:
df = {'key': ['A', 'B', 'A', 'B'], 'value': [2,2,1,1]}
df = pd.DataFrame(df)
df
key value
0 A 2
1 B 2
2 A 1
3 B 1
I know that doing groupby()
would return a groupby object, and I know that I can do a lot of aggregating stuff (count, size, mean, etc) using the groupby object. However, I don't want to aggregate, I just want to groupby my dataframe based on 'key' column and store it as a dataframe like the following:
key value
0 A 2
1 A 1
2 B 2
3 B 1
Once I get this step done, what I eventually want is to order each group by value like the following:
key value
0 A 1
1 A 2
2 B 1
3 B 2
Any answer, comment, or hint is greatly appreciated. Thanks!
Upvotes: 4
Views: 23105
Reputation: 1
def regroup_key_then_sort_value(df):
DFS=[]
for name, grp in df.groupby('key',sort=True):
DFS.append(grp.sort_values('value'))
return pd.concat(DFS).reset_index(drop=True)
df = pd.DataFrame({'key': ['A', 'B', 'A', 'B'], 'value': [2,2,1,1]})
output = regroup_key_then_sort_value(df)
print (output)
Output:
key value
0 A 1
1 A 2
2 B 1
3 B 2
Upvotes: 0
Reputation: 236
If the reason you want to use groupby
is to preserve the index structure then you can do the following:
df = {'key': ['A', 'B', 'A', 'B'], 'value': [2,2,1,1]}
df = pd.DataFrame(df)
print(df)
key value
0 A 2
1 B 2
2 A 1
3 B 1
So, first create the index:
df.set_index(['key'], inplace=True)
print(df)
value
key
A 2
B 2
A 1
B 1
Then, sort the index:
df.sort_index(inplace=True)
print(df)
value
key
A 2
A 1
B 2
B 1
Then, sort the values:
df.sort_values('value',inplace=True)
print(df)
value
key
A 1
B 1
A 2
B 2
And if you want to preserve the original index, finally do:
df.reset_index(inplace=True)
print(df)
key value
0 A 1
1 B 1
2 A 2
3 B 2
Upvotes: 2
Reputation: 831
If you are willing to do it without using chaining then this should work...
df = {'key': ['A', 'B', 'A', 'B'], 'value': [2,2,1,1]}
df = pd.DataFrame(df)
groups = df.groupby(['key', 'value'])
groups = sorted(groups)
df = pd.concat([g for _, g in groups])
print(df)
For the record, I don't fully understand why you wouldn't sort the entire frame... I am guessing that you need groups for other transformations besides sorting anyway, and so you want to save yourself from having to sort the entire frame. If you found better performance by doing this then please let me know :)
Upvotes: 1
Reputation: 33793
You can get your desired output by sorting your dataframe with sort_values
instead of doing a groupby
.
df.sort_values(['key', 'value'], inplace=True)
Edit:
If you really want to use groupby
to perform the grouping of the keys, so could apply a trivial filter
to the groupby
object.
df = df.groupby('key').filter(lambda x: True)
This doesn't seem like the best way to get a dataframe back, but nothing else immediately comes to mind. Afterwards you'd still need to use sort_values
to order the values column.
Upvotes: 11