Reputation: 45
Is there any efficient way to reshape a dataframe from:
(A1, A2, A3, B1, B2, B3, C1, C2, C3, TT, YY and ZZ are columns)
A1 A2 A3 B1 B2 B3 C1 C2 C3 TT YY ZZ
11 22 33 44 55 66 77 88 99 23 24 25
11 22 33 44 55 66 77 88 99 23 24 25
11 22 33 44 55 66 77 88 99 23 24 25
11 22 33 44 55 66 77 88 99 23 24 25
11 22 33 44 55 66 77 88 99 23 24 25
11 22 33 44 55 66 77 88 99 23 24 25
TO:
HH JJ KK TT YY ZZ
11 22 33 23 24 25
11 22 33 23 24 25
11 22 33 23 24 25
11 22 33 23 24 25
11 22 33 23 24 25
11 22 33 23 24 25
44 55 66 23 24 25
44 55 66 23 24 25
44 55 66 23 24 25
44 55 66 23 24 25
44 55 66 23 24 25
44 55 66 23 24 25
77 88 99 23 24 25
77 88 99 23 24 25
77 88 99 23 24 25
77 88 99 23 24 25
77 88 99 23 24 25
77 88 99 23 24 25
HH, JJ and KK are new columns where I would make a vertical stack of column A, B, C and keeping in horizontal stack TT, YY and ZZ
A1 A2 A3 TT YY ZZ
B1 B2 B3 TT YY ZZ
C1 C2 C3 TT YY ZZ
Thanks for your help
Upvotes: 3
Views: 84
Reputation: 28649
a bit longer than the previous solution :
#extract columns ending with numbers
abc = df.filter(regex='\d$')
#sort columns into separate lists
from itertools import groupby
from operator import itemgetter
cols = sorted(abc.columns,key=itemgetter(0))
filtered_columns = [list(g) for k,g in groupby(cols,key=itemgetter(0))]
#iterate through the dataframe
#and stack them
abc_stack = pd.concat([abc.filter(col)
.set_axis(['HH','JJ','KK'],axis='columns')
for col in filtered_columns],
ignore_index=True)
#filter for columns ending with alphabets
tyz = df.filter(regex= '[A-Z]$')
#get the dataframe to be the same length as abc_stack
tyz_stack = pd.concat([tyz] * len(filtered_columns),ignore_index=True)
#combine both dataframes
res = pd.concat([abc_stack,tyz_stack], axis=1)
res
HH JJ KK TT YY ZZ
0 11 22 33 23 24 25
1 11 22 33 23 24 25
2 11 22 33 23 24 25
3 11 22 33 23 24 25
4 11 22 33 23 24 25
5 11 22 33 23 24 25
6 44 55 66 23 24 25
7 44 55 66 23 24 25
8 44 55 66 23 24 25
9 44 55 66 23 24 25
10 44 55 66 23 24 25
11 44 55 66 23 24 25
12 77 88 99 23 24 25
13 77 88 99 23 24 25
14 77 88 99 23 24 25
15 77 88 99 23 24 25
16 77 88 99 23 24 25
17 77 88 99 23 24 25
UPDATE : 2021-01-08
The reshaping process could be abstracted by using the pivot_longer function from pyjanitor; at the moment you have to install the latest development version from github:
The data you shared has patterns (some columns ends with 1
, others with 2
, and some end with 3
), we can use these patterns to reshape the data;
# install latest dev version
# pip install git+https://github.com/ericmjl/pyjanitor.git
import janitor
(df.pivot_longer(names_to=("HH", "JJ", "KK"),
names_pattern=("1$", "2$", "3$"),
index=("TT", "YY", "ZZ")
)
.sort_index(axis="columns"))
Basically, what it does is look for columns that end with 1
, aggregates them into one column ("TT") and does the same for 2
and 3
.
Upvotes: 1
Reputation: 4607
You can use Column splitting and concatenation
df = pd.read_clipboard()
ColSets= [df.columns[i:i+3] for i in np.arange(0,len(df.columns)-3,3)]
LCols = df.columns[-3:]
NewDf = pd.concat([df[ColSet].join(df[LCols]).T.reset_index(drop=True).T for ColSet in ColSets])
NewDf.columns = ['HH', 'JJ', 'KK', 'TT', 'YY', 'ZZ']
Out:
HH JJ KK TT YY ZZ
0 11 22 33 23 24 25
1 11 22 33 23 24 25
2 11 22 33 23 24 25
3 11 22 33 23 24 25
4 11 22 33 23 24 25
5 11 22 33 23 24 25
0 44 55 66 23 24 25
1 44 55 66 23 24 25
2 44 55 66 23 24 25
3 44 55 66 23 24 25
4 44 55 66 23 24 25
5 44 55 66 23 24 25
0 77 88 99 23 24 25
1 77 88 99 23 24 25
2 77 88 99 23 24 25
3 77 88 99 23 24 25
4 77 88 99 23 24 25
5 77 88 99 23 24 25
Upvotes: 1