Reputation: 2681
I want to print the result of grouping with Pandas.
I have a dataframe:
import pandas as pd
df = pd.DataFrame({'A': ['one', 'one', 'two', 'three', 'three', 'one'], 'B': range(6)})
print(df)
A B
0 one 0
1 one 1
2 two 2
3 three 3
4 three 4
5 one 5
When printing after grouping by 'A' I have the following:
print(df.groupby('A'))
<pandas.core.groupby.DataFrameGroupBy object at 0x05416E90>
How can I print the dataframe grouped?
If I do:
print(df.groupby('A').head())
I obtain the dataframe as if it was not grouped:
A B
A
one 0 one 0
1 one 1
two 2 two 2
three 3 three 3
4 three 4
one 5 one 5
I was expecting something like:
A B
A
one 0 one 0
1 one 1
5 one 5
two 2 two 2
three 3 three 3
4 three 4
Upvotes: 247
Views: 401082
Reputation: 16
Assign the Groupby object a variable and use .first() method. Example:
a = df_apps_clean[['App', 'Installs']].groupby('Installs')
a.first() <----
Upvotes: 1
Reputation: 1
use the get_group() method you can have something like this
new_group = df.groupby(['A'])
get_group('')
put the name of the group you want to get inside the method
Upvotes: -1
Reputation: 11568
Simply do:
grouped_df = df.groupby('A')
for key, item in grouped_df:
print(grouped_df.get_group(key), "\n\n")
Deprecation Notice:
ix
was deprecated in 0.20.0
This also works,
grouped_df = df.groupby('A')
gb = grouped_df.groups
for key, values in gb.iteritems():
print(df.ix[values], "\n\n")
For selective key grouping: Insert the keys you want inside the key_list_from_gb
, in following, using gb.keys()
: For Example,
gb = grouped_df.groups
gb.keys()
key_list_from_gb = [key1, key2, key3]
for key, values in gb.items():
if key in key_list_from_gb:
print(df.ix[values], "\n")
Upvotes: 154
Reputation: 195
you just need to convert the DataFrameGroupBy object to list and you can simply print it.. ls_grouped_df = list(df.groupby('A')) print(ls_grouped_df)
Upvotes: 0
Reputation: 2255
This is a better general purpose answer. This function will print all group names and values, or optionally selects one or more groups for display.
def print_pd_groupby(X, grp=None):
'''Display contents of a Panda groupby object
:param X: Pandas groupby object
:param grp: a list with one or more group names
'''
if grp is None:
for k,i in X:
print("group:", k)
print(i)
else:
for j in grp:
print("group:", j)
print(X.get_group(j))
In your example case, here's session output
In [116]: df = pd.DataFrame({'A': ['one', 'one', 'two', 'three', 'three', 'one'], 'B': range(6)})
In [117]: dfg = df.groupby('A')
In [118]: print_pd_groupby(dfg)
group: one
A B
0 one 0
1 one 1
5 one 5
group: three
A B
3 three 3
4 three 4
group: two
A B
2 two 2
In [119]: print_pd_groupby(dfg, grp = ["one", "two"])
group: one
A B
0 one 0
1 one 1
5 one 5
group: two
A B
2 two 2
This is a better answer because a function is re-usable content, put it in your package or function collection and never re-write that "scriptish" approach again.
IMHO, something like this should be a built in method in Pandas groupby.
Upvotes: 3
Reputation: 414
In Jupyter Notebook, if you do the following, it prints a nice grouped version of the object. The apply
method helps in creation of a multiindex dataframe.
by = 'A' # groupby 'by' argument
df.groupby(by).apply(lambda a: a[:])
Output:
A B
A
one 0 one 0
1 one 1
5 one 5
three 3 three 3
4 three 4
two 2 two 2
If you want the by
column(s) to not appear in the output, just drop the column(s), like so.
df.groupby(by).apply(lambda a: a.drop(by, axis=1)[:])
Output:
B
A
one 0 0
1 1
5 5
three 3 3
4 4
two 2 2
Here, I am not sure as to why .iloc[:]
does not work instead of [:]
at the end. So, if there are some issues in future due to updates (or at present), .iloc[:len(a)]
also works.
Upvotes: 31
Reputation: 21
to print all (or arbitrarily many) lines of the grouped df:
import pandas as pd
pd.set_option('display.max_rows', 500)
grouped_df = df.group(['var1', 'var2'])
print(grouped_df)
Upvotes: 0
Reputation: 1495
In addition to previous answers:
Taking your example,
df = pd.DataFrame({'A': ['one', 'one', 'two', 'three', 'three', 'one'], 'B': range(6)})
Then simple 1 line code
df.groupby('A').apply(print)
Upvotes: 89
Reputation: 171
Call list() on the GroupBy object
print(list(df.groupby('A')))
gives you:
[('one', A B
0 one 0
1 one 1
5 one 5), ('three', A B
3 three 3
4 three 4), ('two', A B
2 two 2)]
Upvotes: 5
Reputation: 61
you cannot see the groupBy data directly by print statement but you can see by iterating over the group using for loop try this code to see the group by data
group = df.groupby('A') #group variable contains groupby data
for A,A_df in group: # A is your column and A_df is group of one kind at a time
print(A)
print(A_df)
you will get an output after trying this as a groupby result
I hope it helps
Upvotes: 2
Reputation: 697
I found a tricky way, just for brainstorm, see the code:
df['a'] = df['A'] # create a shadow column for MultiIndexing
df.sort_values('A', inplace=True)
df.set_index(["A","a"], inplace=True)
print(df)
the output:
B
A a
one one 0
one 1
one 5
three three 3
three 4
two two 2
The pros is so easy to print, as it returns a dataframe, instead of Groupby Object. And the output looks nice. While the con is that it create a series of redundant data.
Upvotes: 1
Reputation: 9976
Thanks to Surya for good insights. I'd clean up his solution and simply do:
for key, value in df.groupby('A'):
print(key, value)
Upvotes: 6
Reputation: 2208
Another simple alternative:
for name_of_the_group, group in grouped_dataframe:
print (name_of_the_group)
print (group)
Upvotes: 16
Reputation: 1217
If you're simply looking for a way to display it, you could use describe():
grp = df.groupby['colName']
grp.describe()
This gives you a neat table.
Upvotes: 98
Reputation: 11568
Also, other simple alternative could be:
gb = df.groupby("A")
gb.count() # or,
gb.get_group(your_key)
Upvotes: 11
Reputation: 35235
I confirmed that the behavior of head()
changes between version 0.12 and 0.13. That looks like a bug to me. I created an issue.
But a groupby operation doesn't actually return a DataFrame sorted by group. The .head()
method is a little misleading here -- it's just a convenience feature to let you re-examine the object (in this case, df
) that you grouped. The result of groupby
is separate kind of object, a GroupBy
object. You must apply
, transform
, or filter
to get back to a DataFrame or Series.
If all you wanted to do was sort by the values in columns A, you should use df.sort('A')
.
Upvotes: 17