Reputation: 105
Say I have this data:
project: group: sum:
A John 12
A Sam 10
B Sun 4
B Toy 5
B Joy 7
C Sam 11
The data is in data set frame_main. I wanted to sum up by project so I did:
result_main = pd.concat(frame_main).groupby(["project","group"]).sum()
It basically doing what I wanted, which is summing up the third column and group by the first:
project: group: sum:
A John 12
Sam 10
B Sun 4
Toy 5
Joy 7
C Sam 11
But now when I'm trying to print it using the following:
print(tabulate(result_main, headers="keys", tablefmt='psql'))
It prints like that:
+---------------------------+-----------------+
| | sum: |
|---------------------------+-----------------|
| ('A', 'John') | 12 |
| ('A', 'Sam') | 10 |
| ('B', 'Sun') | 4 |
| ('B', 'Toy') | 5 |
| ('B', 'Joy') | 7 |
| ('C', 'Sam') | 11 |
How can I print so it would look like the output above? I need 3 columns and grouped by the first.
Upvotes: 5
Views: 1614
Reputation: 153460
Much like @Craig we can mask those duplicate value in 'project:' column.
df_sum = df_sum.reset_index()
df_sum['project:'] = df_sum['project:'].mask(df_sum['project:'].duplicated(),'')
print(df_sum.set_index('project:').to_markdown(tablefmt='psql'))
Output:
+------------+----------+--------+
| project: | group: | sum: |
|------------+----------+--------|
| A | John | 12 |
| | Sam | 10 |
| B | Sun | 4 |
| | Toy | 5 |
| | Joy | 7 |
| C | Sam | 11 |
+------------+----------+--------+
Upvotes: 4
Reputation: 4855
The dataframe created after the .sum()
operation has a multi-index. If you reset the index you will get a table that looks the way you want. The .reset_index()
method allows you to specify which index you want to drop (and keeps the others). In your case, you want to keep the first index but not the second, so use .reset_index(level=[1])
as shown here:
import pandas as pd
from tabulate import tabulate
result_main = pd.concat(frame_main).groupby(["project","group"]).sum()
result_main = result_main.reset_index(level=[1])
print(tabulate(result_main, headers="keys", tablefmt='psql'))
Will produce:
+-----------+---------+-------+
| project | group | sum |
|-----------+---------+-------|
| A | John | 12 |
| A | Sam | 10 |
| B | Joy | 7 |
| B | Sun | 4 |
| B | Toy | 5 |
| C | Sam | 11 |
+-----------+---------+-------+
As Scott Boston mentions in a comment, since Pandas 1.0.0, you can use the .to_markdown()
method instead of tabulate
to achieve the same thing, but you still have to reset the index to get three columns.
This is equivalent to the last line of the code above:
print(result_main.to_markdown(tablefmt='psql'))
Upvotes: 2
Reputation: 323276
We can do
df.index=df.index.values
df
Out[350]:
sum
(A, John) 12
(A, Sam) 10
(B, Sun) 4
(B, Toy) 5
(B, Joy) 7
(C, Sam) 11
Upvotes: 1
Reputation: 150745
This looks silly but does exactly what you need:
result_main.index = [a for a in result_main.index]
Output:
sum
(A, John) 12
(A, Sam) 10
(B, Joy) 7
(B, Sun) 4
(B, Toy) 5
(C, Sam) 11
Upvotes: 1