jenny
jenny

Reputation: 105

print 3 columns from pandas data set in a table

Say I have this data:

project:  group:  sum:
A         John    12
A         Sam     10
B         Sun     4
B         Toy     5
B         Joy     7
C         Sam     11

The data is in data set frame_main. I wanted to sum up by project so I did:

result_main = pd.concat(frame_main).groupby(["project","group"]).sum()

It basically doing what I wanted, which is summing up the third column and group by the first:

project:  group:  sum:
A         John    12
          Sam     10
B         Sun     4
          Toy     5
          Joy     7
C         Sam     11

But now when I'm trying to print it using the following:

print(tabulate(result_main, headers="keys", tablefmt='psql'))

It prints like that:

+---------------------------+-----------------+                                                                                       
|                           |   sum:          |                                                                                       
|---------------------------+-----------------|                                                                                       
| ('A', 'John')             |             12  |                                                                                       
| ('A', 'Sam')              |             10  |                                                                                       
| ('B', 'Sun')              |             4   |
| ('B', 'Toy')              |             5   |                                                                                       
| ('B', 'Joy')              |             7   |                                                                                       
| ('C', 'Sam')              |             11  |

How can I print so it would look like the output above? I need 3 columns and grouped by the first.

Upvotes: 5

Views: 1614

Answers (4)

Scott Boston
Scott Boston

Reputation: 153460

Much like @Craig we can mask those duplicate value in 'project:' column.

df_sum = df_sum.reset_index()
df_sum['project:'] = df_sum['project:'].mask(df_sum['project:'].duplicated(),'')

print(df_sum.set_index('project:').to_markdown(tablefmt='psql'))

Output:

+------------+----------+--------+
| project:   | group:   |   sum: |
|------------+----------+--------|
| A          | John     |     12 |
|            | Sam      |     10 |
| B          | Sun      |      4 |
|            | Toy      |      5 |
|            | Joy      |      7 |
| C          | Sam      |     11 |
+------------+----------+--------+

Upvotes: 4

Craig
Craig

Reputation: 4855

The dataframe created after the .sum() operation has a multi-index. If you reset the index you will get a table that looks the way you want. The .reset_index() method allows you to specify which index you want to drop (and keeps the others). In your case, you want to keep the first index but not the second, so use .reset_index(level=[1]) as shown here:

import pandas as pd
from tabulate import tabulate

result_main = pd.concat(frame_main).groupby(["project","group"]).sum()
result_main = result_main.reset_index(level=[1])
print(tabulate(result_main, headers="keys", tablefmt='psql'))

Will produce:

+-----------+---------+-------+
| project   | group   |   sum |
|-----------+---------+-------|
| A         | John    |    12 |
| A         | Sam     |    10 |
| B         | Joy     |     7 |
| B         | Sun     |     4 |
| B         | Toy     |     5 |
| C         | Sam     |    11 |
+-----------+---------+-------+

As Scott Boston mentions in a comment, since Pandas 1.0.0, you can use the .to_markdown() method instead of tabulate to achieve the same thing, but you still have to reset the index to get three columns.

This is equivalent to the last line of the code above:

print(result_main.to_markdown(tablefmt='psql'))

Upvotes: 2

BENY
BENY

Reputation: 323276

We can do

df.index=df.index.values
df
Out[350]: 
           sum
(A, John)   12
(A, Sam)    10
(B, Sun)     4
(B, Toy)     5
(B, Joy)     7
(C, Sam)    11

Upvotes: 1

Quang Hoang
Quang Hoang

Reputation: 150745

This looks silly but does exactly what you need:

result_main.index = [a for a in result_main.index]

Output:

           sum
(A, John)   12
(A, Sam)    10
(B, Joy)     7
(B, Sun)     4
(B, Toy)     5
(C, Sam)    11

Upvotes: 1

Related Questions