agustin
agustin

Reputation: 1351

Python - pandas: create new columns and transpose depending on column names

Given the following dataframe definition,

data = pd.DataFrame({"sec":["dim1", "dim2", "dim3"],  # Construct a pandas DataFrame
                     "m1vc_15":[1,2,3],
                     "m1vu_15":[10,11,12],
                     "m2vc_15":[20,21,22],
                     "m2vu_15":[30,31,32],
                     "m1vc_16":[4,5,6],
                     "m1vu_16":[13,14,15],
                     "m2vc_16":[23,24,25],
                     "m2vu_16":[33,34,35]})

which generates this table structure:

sec  m1vc_15  m1vu_15  m2vc_15  m2vu_15  m1vc_16  m1vu_16  m2vc_16  m2vu_16
dim1        1       10       20       30        4       13       23       33
dim2        2       11       21       31        5       14       24       34
dim3        3       12       22       32        6       15       25       35

Is there any function or package that allows me to create this expected outcome?

data2 = pd.DataFrame({"sec":np.repeat(["dim1", "dim2", "dim3", "dim1", "dim2", "dim3"],2),  # Construct a pandas DataFrame
                     "type":np.repeat(['vc','vu'],6),
                     "year":["2015", "2016", "2015", "2016", "2015", "2016",
                             "2015", "2016", "2015", "2016", "2015", "2016"],
                     "m1":[1, 4, 2, 5, 3, 6, 10, 13, 11, 14, 12, 15],
                     "m2":[20, 23, 21, 24, 22, 25, 30, 33, 31, 34, 32, 35]})
""" 
# Output
sec type year  m1  m2
dim1   vc 2015   1  20
dim1   vc 2016   4  23
dim2   vc 2015   2  21
dim2   vc 2016   5  24
dim3   vc 2015   3  22
dim3   vc 2016   6  25
dim1   vu 2015  10  30
dim1   vu 2016  13  33
dim2   vu 2015  11  31
dim2   vu 2016  14  34
dim3   vu 2015  12  32
dim3   vu 2016  15  35
"""

The logic for the transpose is:

Upvotes: 2

Views: 525

Answers (2)

sammywemmy
sammywemmy

Reputation: 28699

One efficient option is with pivot_longer from pyjanitor:

# pip install pyjanitor
import pandas as pd
import janitor

(data
.pivot_longer(
    index = 'sec', 
    names_to = ('.value', 'type', 'year'), 
    names_pattern = r"(.\d)(.+)_(\d+)")
.assign(year = lambda df: ("20"+df.year).astype(int))
)
     sec type  year  m1  m2
0   dim1   vc  2015   1  20
1   dim2   vc  2015   2  21
2   dim3   vc  2015   3  22
3   dim1   vu  2015  10  30
4   dim2   vu  2015  11  31
5   dim3   vu  2015  12  32
6   dim1   vc  2016   4  23
7   dim2   vc  2016   5  24
8   dim3   vc  2016   6  25
9   dim1   vu  2016  13  33
10  dim2   vu  2016  14  34
11  dim3   vu  2016  15  35

The .value determines which columns remain as header, which in this case is m1/m2; the rest are lumped into the type and year columns

A faster option would be to convert the year column into an integer within pivot_longer, by using the names_transform option:

(data
.pivot_longer(
    index = 'sec', 
    names_to = ('.value', 'type', 'year'), 
    names_pattern = r"(.\d)(.+)_(\d+)",
    names_transform = {'year': int})
.assign(year = lambda df: df.year + 2000)
)

     sec type  year  m1  m2
0   dim1   vc  2015   1  20
1   dim2   vc  2015   2  21
2   dim3   vc  2015   3  22
3   dim1   vu  2015  10  30
4   dim2   vu  2015  11  31
5   dim3   vu  2015  12  32
6   dim1   vc  2016   4  23
7   dim2   vc  2016   5  24
8   dim3   vc  2016   6  25
9   dim1   vu  2016  13  33
10  dim2   vu  2016  14  34
11  dim3   vu  2016  15  35

Upvotes: 1

akuiper
akuiper

Reputation: 214987

You can use wide_to_long with a few more steps to clean up the columns:

df = pd.wide_to_long(data, stubnames=['m1', 'm2'], i=['sec'], j='type_year', suffix='\w+').reset_index()
df[['type', 'year']] = df.type_year.str.split('_', expand=True)
df.year = '20' + df.year
df.drop('type_year', axis=1, inplace=True)

df
     sec  m1  m2 type  year
0   dim1   1  20   vc  2015
1   dim2   2  21   vc  2015
2   dim3   3  22   vc  2015
3   dim1  10  30   vu  2015
4   dim2  11  31   vu  2015
5   dim3  12  32   vu  2015
6   dim1   4  23   vc  2016
7   dim2   5  24   vc  2016
8   dim3   6  25   vc  2016
9   dim1  13  33   vu  2016
10  dim2  14  34   vu  2016
11  dim3  15  35   vu  2016

Upvotes: 2

Related Questions