user1855463
user1855463

Reputation: 67

Stack data from columns to rows in pandas dataframe

I am trying to stack financial values year wise in pandas Dataframe. But not able to get started.

All I have tried is

df1 = df.set_index(['refnum','y1gp','y2gp','y3gp']).stack()\
.reset_index(name='REV').rename(columns={'level_5':'Year'})

Existing:

refnum y1 y1rev y1gp y2 y2rev y2gp y3 y3rev y3gp
10001 2021 300 200 2022 100 600 2023 300 300
10002 2020 300 200 2021 200 500 2022 300 300
10003 2021 300 200 2022 500 500 2023 300 300

Expected:

refnum year REV GP Base Year
10001 2021 300 200 BaseYear
10001 2022 100 600 BaseYear+1
10001 2023 300 300 BaseYear+2
10002 2020 300 200 BaseYear
10002 2021 200 500 BaseYear+1
10002 2022 300 300 BaseYear+2
10003 2021 300 200 BaseYear
10003 2022 500 500 BaseYear+1
10003 2023 300 300 BaseYear+2

Upvotes: 1

Views: 798

Answers (3)

sammywemmy
sammywemmy

Reputation: 28644

You could use pivot_longer from pyjanitor; for this scenario, you pass regexes to names_pattern, and the new column names in names_to:

# pip install pyjanitor
import janitor
import pandas as pd
df.pivot_longer(index='refnum', 
                names_to=['year', 'REV', 'GP'], 
                names_pattern=['^y\d$', '.*rev$', '.*gp$']
               )

   refnum  year  REV   GP
0   10001  2021  300  200
1   10002  2020  300  200
2   10003  2021  300  200
3   10001  2022  100  600
4   10002  2021  200  500
5   10003  2022  500  500
6   10001  2023  300  300
7   10002  2022  300  300
8   10003  2023  300  300

If you want the base year included, you can modify the column labels that end with a number, before using pivot_longer:

(df.rename(columns = lambda col: f"{col}YEAR" 
                                 if col.endswith(('1','2','3')) 
                                 else col)
   .pivot_longer(index='refnum', 
                 names_to= ("Base Year", ".value"), 
                 names_pattern=r".(\d)(.+)", 
                 sort_by_appearance=True)
 )

   refnum Base Year  YEAR  rev   gp
0   10001         1  2021  300  200
1   10001         2  2022  100  600
2   10001         3  2023  300  300
3   10002         1  2020  300  200
4   10002         2  2021  200  500
5   10002         3  2022  300  300
6   10003         1  2021  300  200
7   10003         2  2022  500  500
8   10003         3  2023  300  300

labels associated with .value stay as column headers, while the rest are lumped into a new column (base year)

Upvotes: 2

Andrej Kesely
Andrej Kesely

Reputation: 195418

Try:

df.columns = [re.sub(r"y(\d+)(.*)", r"\2-\1", c) for c in df.columns]
x = (
    pd.wide_to_long(
        df, stubnames=["", "gp", "rev"], sep="-", i="refnum", j="Base Year"
    )
    .rename(columns={"": "year"})
    .reset_index()
    .sort_values(by="refnum")
)
print(x)

Prints:

   refnum  Base Year  year   gp  rev
0   10001          1  2021  200  300
3   10001          2  2022  600  100
6   10001          3  2023  300  300
1   10002          1  2020  200  300
4   10002          2  2021  500  200
7   10002          3  2022  300  300
2   10003          1  2021  200  300
5   10003          2  2022  500  500
8   10003          3  2023  300  300

Upvotes: 4

Henry Ecker
Henry Ecker

Reputation: 35626

Let's convert the headers to a useable MultiIndex using str.replace and str.split then stack to go from wide-form to long. Then groupby cumcount to create the BaseYear column.

# Save Columns
df = df.set_index('refnum')
# Create a MultiIndex with Numbers at the end and split into multiple levels
df.columns = (
    df.columns.str.replace(r'^(.*?)(\d+)(.*)$', r'\1\3/\2', regex=True)
        .str.split('/', expand=True)
)
# Wide Format to Long + Rename Columns
df = df.stack().droplevel(-1).reset_index().rename(
    columns={'y': 'Year', 'ygp': 'GP', 'yrev': 'REV'}
)
# Add Base Year Column
df['BaseYear'] = "BaseYear+" + df.groupby('refnum').cumcount().astype(str)
# df['BaseYear'] = df.groupby('refnum').cumcount()  # (int version)

df:

   refnum  Year   GP  REV    BaseYear
0   10001  2021  200  300  BaseYear+0
1   10001  2022  600  100  BaseYear+1
2   10001  2023  300  300  BaseYear+2
3   10002  2020  200  300  BaseYear+0
4   10002  2021  500  200  BaseYear+1
5   10002  2022  300  300  BaseYear+2
6   10003  2021  200  300  BaseYear+0
7   10003  2022  500  500  BaseYear+1
8   10003  2023  300  300  BaseYear+2

Upvotes: 3

Related Questions