Reputation: 8913
Given the following data frame:
pd.DataFrame({"A":[1,2,3],"B":[4,5,6],"C":[6,7,8]})
A B C
0 1 4 6
1 2 5 7
2 3 6 8
3 11 14 16
4 12 15 17
5 13 16 18
I would like to reshape it so it would look like so:
A B C A_1 B_1 C_1 A_2 B_2 C_2
0 1 4 6 2 5 7 3 6 8
1 11 14 16 12 15 17 13 16 18
So every 3 rows are grouped into 1 row
How can I achieve this with pandas?
Upvotes: 10
Views: 2468
Reputation: 863351
One idea is create MultiIndex with integer and modulo division and reshape by DataFrame.unstack
:
a = np.arange(len(df))
df.index = [a // 3, a % 3]
df = df.unstack().sort_index(axis=1, level=1)
df.columns = [f'{a}_{b}' for a, b in df.columns]
print (df)
A_0 B_0 C_0 A_1 B_1 C_1 A_2 B_2 C_2
0 1 4 6 2 5 7 3 6 8
1 11 14 16 12 15 17 13 16 18
For reverse operation is possible use str.split
with DataFrame.stack
:
a = np.arange(len(df))
df1 = (df.set_index(pd.MultiIndex.from_arrays([a // 3, a % 3]))
.unstack().sort_index(axis=1, level=1))
df1.columns = [f'{a}_{b}' for a, b in df1.columns]
print (df1)
A_0 B_0 C_0 A_1 B_1 C_1 A_2 B_2 C_2
0 1 4 6 2 5 7 3 6 8
1 11 14 16 12 15 17 13 16 18
df1.columns = df1.columns.str.split('_', expand=True)
df2 = df1.stack().reset_index(drop=True)
print (df2)
A B C
0 1 4 6
1 2 5 7
2 3 6 8
3 11 14 16
4 12 15 17
5 13 16 18
Upvotes: 12
Reputation: 28709
Data is transformed every three rows : we can use numpy's reshape method to transform the data, and create a cartesian product of range(1,3) with the columns to get the new columns :
from itertools import product
row = len(df)//3
#create new columns
new_columns = df.columns.union(["_".join((letter,str(num)))
for letter,num in product(df.columns,range(1,3))],
sort=False)
#create new dataframe
new_df = pd.DataFrame(np.reshape(df.to_numpy(),(row,-1)),
columns=new_columns)
new_df
A B C A_1 A_2 B_1 B_2 C_1 C_2
0 1 4 6 2 5 7 3 6 8
1 11 14 16 12 15 17 13 16 18
Upvotes: 3
Reputation: 42926
We can group your data by repeating groups of n
(in this case 3) and then use pd.concat
to concat them together over the column axis:
n = 3
grps = df.groupby(df.index // n).cumcount()
dfn = pd.concat([d.reset_index(drop=True) for _, d in df.groupby(grps)], axis=1)
dfn.columns = [f'{col}_{i}' for col, i in zip(dfn.columns, np.arange(dfn.shape[1]) // n)]
A_0 B_0 C_0 A_1 B_1 C_1 A_2 B_2 C_2
0 1 4 6 2 5 7 3 6 8
1 11 14 16 12 15 17 13 16 18
Upvotes: 2
Reputation: 9481
new = pd.concat([df[a::3].reset_index(drop=True) for a in range(3)], axis=1)
new.columns = ['{}_{}'.format(a,b) for b in range(3) for a in 'ABC']
Upvotes: 10
Reputation: 1455
You can try this:
pd.DataFrame(
data=df.values.reshape([-1, df.values.shape[1]*3]),
columns=list(df.columns) + sum([[c+'_'+str(i) for c in df.columns] for i in range(1, 3)], [])
)
Output for your input dataframe:
A B C A_1 B_1 C_1 A_2 B_2 C_2
0 1 4 6 2 5 7 3 6 8
1 11 14 16 12 15 17 13 16 18
Upvotes: 3