How to optimize code to pivot table in python pandas

Question

I created a DataFrame in Python pandas that matches companies (A, B, C) to record_ids using four matching strings (type_1, type_2, type_3, and type_4). It looks like this:

    vendor match_type  record_id  percent     cumulative_percent
0     A      type_1      2974     26.348897   26.348897
1     A      type_2       275     2.436431    28.785328
2     A      type_3       214     1.895987    30.681315
3     A      type_4      2341     20.740675   51.421990
4     B      type_1       440     3.898290    55.320280
5     B      type_2        39     0.345530    55.665810
6     B      type_3        54     0.478427    56.144237
7     B      type_4       596     5.280411    61.424648
8     C      type_1       399     3.535040    64.959688
9     C      type_2        70     0.620183    65.579871
10    C      type_3        44     0.389829    65.969700
11    C      type_4       262     2.321255    68.290954
12   NaN      NaN        3579     31.709046   100.000000

Where:

the record_id column contains the number of matching record_ids
row 12 represents the records that did not match any of the records in companies A, B, or C
percent represents the number of matching record_ids for each row divided by the total number of record_ids,
cumulative_percent is just a running total of percent.

I want to pivot the table to look like this:

match_type    type_1  type_2  type_3  type_4  No Match  Grand Total  percent  cumulative percent
vendor                              
  A            2974    275     214     2341              5804          51.4%      51.4%
  B             440     39      54      596              1129          10.0%      61.4%
  C             399     70      44      262               775           6.9%      68.3%
 NaN                                            3579     3579          31.7%     100.0%
Grand Total    3813    384     312     3199     3579    11287         100.0%

The problem is it took a lot of code to perform the pivot. I couldn't include the percent and cumulative_percent columns in the pivot_table command, and, thus, had to recompute them. I also had to reorder both the columns and rows.

Can anyone can show me how to optimize this into fewer lines of Python code? Here is the code that I wrote to obtain the pivoted table shown above:

tbl = pd.pivot_table(df, values ="record_id", index ="vendor", columns ="match_type", 
                       aggfunc = np.sum, fill_value="", margins=True, margins_name="Grand Total")
column_order=["type_1", "type_2", "type_3", "type_4", "NaN", "Grand Total"]
tbl = tbl.reindex(column_order, axis=1)
tbl.rename(columns={"NaN":"No Match"}, inplace=True)
row_order = ["A", "B", "C", "NaN", "Grand Total"]
tbl = tbl.reindex(row_order, axis=0)
total=sum(tbl["Grand Total"][0:4])
tbl["percent"]=round(tbl["Grand Total"]/total * 100.0, 1)
tbl["cumulative percent"]=tbl.percent[0:4].cumsum()
tbl.percent=tbl.percent.astype(str) + "%"
tbl["cumulative percent"]=tbl["cumulative percent"].astype(str) + "%"
tbl["cumulative percent"].iloc[4]=""
tbl

Thanks in advance.

How to optimize code to pivot table in python pandas

Answers (1)

Related Questions