Reputation: 303
I have a dataset which look like this
time channel min sd mag. frequency
12:00 X 12.0 2.3 x11 fx11
12:00 X 12.0 2.3 x12 fx12
12:00 X 12.0 2.3 x13 fx13
12:00 X 12.0 2.3 x14 fx14
12:00 X 12.0 2.3 x15 fx15
12:00 Y 17.0 2.7 y11 fy11
12:00 Y 17.0 2.7 y12 fy12
12:00 Y 17.0 2.7 y13 fy13
12:00 Y 17.0 2.7 y14 fy14
12:00 Y 17.0 2.7 y15 fy15
12:00 Z 15.0 4.3 z11 fz11
12:00 Z 15.0 4.3 z12 fz12
12:00 Z 15.0 4.3 z13 fz13
12:00 Z 15.0 4.3 z14 fz14
12:00 Z 15.0 4.3 z15 fz15
12:01 X 13.0 4.9 x21 fx21
.... ... ... ... ... .....
.... ..... .... ... .... ..... ....
As you could see that for channel X, Y, Z there are entries like 'time', 'min' and 'sd' repeating 5 times, however 'mag.' and 'frequency' are changing each time. The shape of this dataset is (740231, 6), where this 15 rows for channel X,Y,Z keep repeating as I described above.
I would like to get rid of this repetition and would like to transform this dataset like this:
time channel min sd m1 f1 m2 f2 m3 f3 m4 f4 m5 f5
12:00 X 12.0 2.3 x11 fx11 x12 fx12 x13 fx13 x14 fx14 x15 fx15
12:00 Y 17.0 2.7 y11 fy11 y12 fy12 y13 fy13 y14 fy14 y15 fy15
12:00 Y 15.0 4.3 z11 fz11 z12 fz12 z13 fz13 z14 fz14 z15 fz15
12:01 X 13.0 4.9 x21 fx21 x22 fx22 x23 fx23 x24 fx24 x25 fx25
.... ... ..... ... .... ..... .... ..... .... .... ....
.... ..... .... .... .... ... .... ..... .... .... ... ... ...
which means that 15 rows x 6 columns values are now transformed in 3 rows x 14 columns.
Any suggestions is appreciated. Many thanks for your time.
Best Regards, pooja
Upvotes: 2
Views: 91
Reputation: 863301
If ordering of output column should be swapped - first f
and then m
columns:
cols = ['time','channel','min', 'sd']
d = {'frequency':'f','mag.':'m'}
g = df.groupby(cols).cumcount().add(1).astype(str)
df = df.rename(columns=d).set_index(cols + [g]).unstack().sort_index(axis=1, level=1)
df.columns = df.columns.map(''.join)
df = df.reset_index()
print (df)
time channel min sd f1 m1 f2 m2 f3 m3 f4 m4 f5 \
0 12:00 X 12.0 2.3 fx11 x11 fx12 x12 fx13 x13 fx14 x14 fx15
1 12:00 Y 17.0 2.7 fy11 y11 fy12 y12 fy13 y13 fy14 y14 fy15
2 12:00 Z 15.0 4.3 fz11 z11 fz12 z12 fz13 z13 fz14 z14 fz15
3 12:01 X 13.0 4.9 fx21 x21 NaN NaN NaN NaN NaN NaN NaN
m5
0 x15
1 y15
2 z15
3 NaN
Explanation:
rename
columns by dictionary
set_index
by counter Series
created by cumcount
with added 1
and converted to strings
unstack
MultiIndex
by sort_index
map
and join
reset_index
for column from index
If ordering of output columns is important is possible use double rename
of columns:
cols = ['time','channel','min', 'sd']
d = {'frequency':2,'mag.':1}
g = df.groupby(cols).cumcount().add(1).astype(str)
df = (df.rename(columns=d)
.set_index(cols + [g])
.unstack()
.sort_index(axis=1, level=1)
.rename(columns={2:'f', 1:'m'}))
df.columns = df.columns.map(''.join)
df = df.reset_index()
print (df)
time channel min sd m1 f1 m2 f2 m3 f3 m4 f4 m5 \
0 12:00 X 12.0 2.3 x11 fx11 x12 fx12 x13 fx13 x14 fx14 x15
1 12:00 Y 17.0 2.7 y11 fy11 y12 fy12 y13 fy13 y14 fy14 y15
2 12:00 Z 15.0 4.3 z11 fz11 z12 fz12 z13 fz13 z14 fz14 z15
3 12:01 X 13.0 4.9 x21 fx21 NaN NaN NaN NaN NaN NaN NaN
f5
0 fx15
1 fy15
2 fz15
3 NaN
Upvotes: 1