stochastic learner
stochastic learner

Reputation: 199

How to convert simple pandas dataframe to multi-index/column dataframe?

I have the following data frame

AAPL INCOME STATEMENT (in mln.) Revenue COGS Gross Profit Gross Profit ratio Research and Development Exp. General and Administrative Exp. Selling and Marketing Exp. Selling, General and Administrative Exp. Other Expenses Operating Expenses COGS and Expenses Interest Expenese Depreciation and Amortization EBITDA EBITDA ratio Operating Income Operating Income ratio Total Other Income Expenses Net Income Before Tax Income Before Tax ratio Income Tax expense Net Income Net Income ratio EPS EPS Diluted Weighted Average Shares Outstanding Weighted Average Shares Outstanding Diluted BALANCE SHEET (in mln.) Cash and Cash Equivalents Short-Term Investments Cash and Short-Term Investments Net Receivables Inventory Other Current Assets Total Current Assets PP&E Goodwill Intangible Assets Goodwill and Intangible Assets Investments Tax Assets Other Non-Current Assets Total Non-Current Assets Other Assets Total Assets Accounts Payable Short-Term Debt Tax Payable Deferred Revenue Other Current Liabilities Total Current Liabilities Long-Term Debt Deferred Revenue Deferred Tax Liabilities Other Non-Current Liabilities Total Non-Current Liabilities Other Liabilities Total Liabilities Common Stock Retained Earnings Other Comprehensive Income/Loss Other Total Stockholders Equity Total Stockholders Equity Total Liabilities And Stockholders Equity CASH FLOW STATEMENT (in mln.) Net Income Depreciation and Amortization Deferred Income Tax Stock Based Compensation Change in Working Capital Accounts Receivable Inventory Accounts Payable Other Working Capital Other Non-Cash Items Cash Provided by Operating Activities CAPEX Acquisitions Net Purchases of Investments Sales/Maturities of Investments Other Investing Activites Cash Used for Investing Activites Debt Repayment Common Stock Issued Common Stock Repurchased Dividends Paid Other Financing Activites Cash Used/Provided by Financing Activities Effect of Forex Changes on Cash Net Change In Cash Cash at the Beginning of Period Cash at the End of Period Free Cash Flow
2020 274,515 169,559 104,956 38.23% 18,752 - - - - 19,916 - - 38,668 208,227 2,873 11,056 81,020 29.51% 66,288 24.15% 803 67,091 24.44% 9,680 57,411 20.91% 3.31 3.28 17,352 17,528 38,016 52,927 90,943 37,445 4,061 11,264 143,713 36,766 - - - - - - 100,887 - - 42,522 180,175 - - 323,888 42,296 13,769 - - 6,643 42,684 105,392 98,667 - - - - 54,490 153,157 - - 258,549 50,779 14,966 (406) - - 65,339 323,888 57,411 11,056 (215) 6,829 5,690 6,917 (127) (4,062) 2,081 (97) 80,674 (7,309) (1,524) (115,148) 120,483 (791) (4,289) (12,629) 880 (72,358) (14,081) 11,368 (86,820) - - (10,435) 50,224 39,789 73,365
2021 365,817 212,981 152,836 41.78% 21,914 - - - - 21,973 - - 43,887 256,868 2,645 11,284 123,136 33.66% 108,949 29.78% 258 109,207 29.85% 14,527 94,680 25.88% 5.67 5.61 16,701 16,865 34,940 27,699 62,639 51,506 6,580 14,111 134,836 39,440 - - - - - - 127,877 - - 48,849 216,166 - - 351,002 54,763 15,613 - - 7,612 47,493 125,481 109,106 - - - - 53,325 162,431 - - 287,912 57,365 5,562 163 - - 63,090 351,002 94,680 11,284 (4,774) 7,906 (4,911) (10,125) (2,642) 12,326 1,676 (147) 104,038 (11,085) (33) (109,689) 106,870 (608) (14,545) (8,750) 1,105 (85,971) (14,467) 14,730 (93,353) - - (3,860) 39,789 35,929 92,953

Now I want to have this dataframe with multi-columns. Something similar to given below. The columns "Revenue" to "Total Liabilities And Stockholders Equity" should fall under "income statement" and so on. I couldn't wrap my head around to make it work using the documentation given here. Your help will be highly appreciated.

INCOME STATEMENT (in mln.) BALANCE SHEET (in mln.) CASH FLOW STATEMENT (in mln.)
AAPL Revenue COGS Gross Profit Gross Profit ratio Research and Development Exp. General and Administrative Exp. Selling and Marketing Exp. Selling, General and Administrative Exp. Other Expenses Operating Expenses COGS and Expenses Interest Expenese Depreciation and Amortization EBITDA EBITDA ratio Operating Income Operating Income ratio Total Other Income Expenses Net Income Before Tax Income Before Tax ratio Income Tax expense Net Income Net Income ratio EPS EPS Diluted Weighted Average Shares Outstanding Weighted Average Shares Outstanding Diluted Cash and Cash Equivalents Short-Term Investments Cash and Short-Term Investments Net Receivables Inventory Other Current Assets Total Current Assets PP&E Goodwill Intangible Assets Goodwill and Intangible Assets Investments Tax Assets Other Non-Current Assets Total Non-Current Assets Other Assets Total Assets Accounts Payable Short-Term Debt Tax Payable Deferred Revenue Other Current Liabilities Total Current Liabilities Long-Term Debt Deferred Revenue Deferred Tax Liabilities Other Non-Current Liabilities Total Non-Current Liabilities Other Liabilities Total Liabilities Common Stock Retained Earnings Other Comprehensive Income/Loss Other Total Stockholders Equity Total Stockholders Equity Total Liabilities And Stockholders Equity Net Income Depreciation and Amortization Deferred Income Tax Stock Based Compensation Change in Working Capital Accounts Receivable Inventory Accounts Payable Other Working Capital Other Non-Cash Items Cash Provided by Operating Activities CAPEX Acquisitions Net Purchases of Investments Sales/Maturities of Investments Other Investing Activites Cash Used for Investing Activites Debt Repayment Common Stock Issued Common Stock Repurchased Other Financing Activites Effect of Forex Changes on Cash Cash at the Beginning of Period Free Cash Flow
2020 274,515 169,559 104,956 38.23% 18,752 - - - - 19,916 - - 38,668 208,227 2,873 11,056 81,020 29.51% 66,288 24.15% 803 67,091 24.44% 9,680 57,411 20.91% 3.31 3.28 17,352 17,528 38,016 52,927 90,943 37,445 4,061 11,264 143,713 36,766 - - - - - - 100,887 - - 42,522 180,175 - - 323,888 42,296 13,769 - - 6,643 42,684 105,392 98,667 - - - - 54,490 153,157 - - 258,549 50,779 14,966 (406) - - 65,339 323,888 57,411 11,056 (215) 6,829 5,690 6,917 (127) (4,062) 2,081 (97) 80,674 (7,309) (1,524) (115,148) 120,483 (791) (4,289) (12,629) 880 (72,358) 11,368 - - 50,224 73,365
2021 365,817 212,981 152,836 41.78% 21,914 - - - - 21,973 - - 43,887 256,868 2,645 11,284 123,136 33.66% 108,949 29.78% 258 109,207 29.85% 14,527 94,680 25.88% 5.67 5.61 16,701 16,865 34,940 27,699 62,639 51,506 6,580 14,111 134,836 39,440 - - - - - - 127,877 - - 48,849 216,166 - - 351,002 54,763 15,613 - - 7,612 47,493 125,481 109,106 - - - - 53,325 162,431 - - 287,912 57,365 5,562 163 - - 63,090 351,002 94,680 11,284 (4,774) 7,906 (4,911) (10,125) (2,642) 12,326 1,676 (147) 104,038 (11,085) (33) (109,689) 106,870 (608) (14,545) (8,750) 1,105 (85,971) 14,730 - - 39,789 92,953

Upvotes: 1

Views: 85

Answers (1)

Laurent
Laurent

Reputation: 13458

With the following toy dataframe mimicking yours:

import pandas as pd

df = pd.DataFrame(
    {
        "Status": [2020, 2021],
        "INCOME STATEMENT": ["", ""],
        "Revenue": [274515, 212981],
        "Operating Expenses": [38668, 43887],
        "COGS": [208227, 256868],
        "Cash": [38016, 34940],
        "BALANCE SHEET": ["", ""],
        "Inventory": [4061, 6580],
        "Investments": [100887, 127887],
    }
)

print(df)
# Output
   Status INCOME STATEMENT  Revenue  Operating Expenses    COGS   Cash  \
0    2020                    274515               38668  208227  38016   
1    2021                    212981               43887  256868  34940   

  BALANCE SHEET  Inventory  Investments  
0                     4061       100887  
1                     6580       127887 

Here is one way to do it:

# Define levels for multiIndex dataframe
COL_LVL_0 = ["", "INCOME STATEMENT", "BALANCE SHEET"]
STARTING_COLS = ["Status", "COGS", "Inventory"]

# Remove empty columns
df = df.loc[:, ~df.columns.isin(COL_LVL_0)]

# Slice dataframe columns list to get level 1 labels for each level 0 label
indices = [df.columns.tolist().index(value) for value in STARTING_COLS]
cols_lvl_1 = []
for idx in indices:
    if idx == 0:
        start = 0
        continue
    cols_lvl_1.append(df.columns.tolist()[start:idx])
    start = idx
cols_lvl_1.append(df.columns.tolist()[start:])

# Setup new index
df.columns = pd.MultiIndex.from_tuples(
    ((lvl0, lvl1) for lvl0, item in zip(COL_LVL_0, cols_lvl_1) for lvl1 in item)
)

So that:

print(df)
# Output
                                    INCOME STATEMENT        BALANCE SHEET  \
  Status Revenue Operating Expenses             COGS   Cash     Inventory   
0   2020  274515              38668           208227  38016          4061   
1   2021  212981              43887           256868  34940          6580   


  Investments  
0      100887  
1      127887  

Upvotes: 1

Related Questions