Violet
Violet

Reputation: 303

transforming old columns to new columns in Pandas Dataframe

I have a dataset which look like this

time channel  min    sd   mag.   frequency  
12:00   X     12.0   2.3  x11     fx11  
12:00   X     12.0   2.3  x12     fx12  
12:00   X     12.0   2.3  x13     fx13  
12:00   X     12.0   2.3  x14     fx14  
12:00   X     12.0   2.3  x15     fx15  
12:00   Y     17.0   2.7  y11     fy11  
12:00   Y     17.0   2.7  y12     fy12  
12:00   Y     17.0   2.7  y13     fy13  
12:00   Y     17.0   2.7  y14     fy14  
12:00   Y     17.0   2.7  y15     fy15  
12:00   Z     15.0   4.3  z11     fz11  
12:00   Z     15.0   4.3  z12     fz12  
12:00   Z     15.0   4.3  z13     fz13  
12:00   Z     15.0   4.3  z14     fz14  
12:00   Z     15.0   4.3  z15     fz15  
12:01   X     13.0   4.9  x21     fx21  
....     ...   ...   ...  ...     .....  
....  .....   ....  ...  ....  .....  .... 

As you could see that for channel X, Y, Z there are entries like 'time', 'min' and 'sd' repeating 5 times, however 'mag.' and 'frequency' are changing each time. The shape of this dataset is (740231, 6), where this 15 rows for channel X,Y,Z keep repeating as I described above.

I would like to get rid of this repetition and would like to transform this dataset like this:

  time channel min  sd   m1  f1    m2 f2   m3   f3   m4  f4    m5  f5    
  12:00   X    12.0 2.3  x11 fx11 x12 fx12 x13 fx13 x14  fx14  x15 fx15  
  12:00   Y    17.0 2.7  y11 fy11 y12 fy12 y13 fy13 y14  fy14  y15 fy15  
  12:00   Y    15.0 4.3  z11 fz11 z12 fz12 z13 fz13 z14  fz14  z15 fz15  
  12:01   X    13.0 4.9  x21 fx21 x22 fx22 x23 fx23 x24  fx24  x25 fx25  
  ....   ...    .....   ...  ....  ..... ....  .....  ....  .... ....  
.... .....  ....  ....  ....  ...  ....  .....  ....  ....  ...  ...  ... 

which means that 15 rows x 6 columns values are now transformed in 3 rows x 14 columns.

Any suggestions is appreciated. Many thanks for your time.

Best Regards, pooja

Upvotes: 2

Views: 91

Answers (1)

jezrael
jezrael

Reputation: 863301

If ordering of output column should be swapped - first f and then m columns:

cols = ['time','channel','min', 'sd']
d = {'frequency':'f','mag.':'m'}
g = df.groupby(cols).cumcount().add(1).astype(str)
df = df.rename(columns=d).set_index(cols + [g]).unstack().sort_index(axis=1, level=1)
df.columns = df.columns.map(''.join)
df = df.reset_index()
print (df)
    time channel   min   sd    f1   m1    f2   m2    f3   m3    f4   m4    f5  \
0  12:00       X  12.0  2.3  fx11  x11  fx12  x12  fx13  x13  fx14  x14  fx15   
1  12:00       Y  17.0  2.7  fy11  y11  fy12  y12  fy13  y13  fy14  y14  fy15   
2  12:00       Z  15.0  4.3  fz11  z11  fz12  z12  fz13  z13  fz14  z14  fz15   
3  12:01       X  13.0  4.9  fx21  x21   NaN  NaN   NaN  NaN   NaN  NaN   NaN   

    m5  
0  x15  
1  y15  
2  z15  
3  NaN  

Explanation:

  1. First rename columns by dictionary
  2. Then set_index by counter Series created by cumcount with added 1 and converted to strings
  3. Reshape by unstack
  4. Soer second level of MultiIndex by sort_index
  5. Flatten MultiIndex columns by map and join
  6. Last reset_index for column from index

If ordering of output columns is important is possible use double rename of columns:

cols = ['time','channel','min', 'sd']
d = {'frequency':2,'mag.':1}

g = df.groupby(cols).cumcount().add(1).astype(str)
df = (df.rename(columns=d)
        .set_index(cols + [g])
        .unstack()
        .sort_index(axis=1, level=1)
        .rename(columns={2:'f', 1:'m'}))

df.columns = df.columns.map(''.join)
df = df.reset_index()
print (df)
    time channel   min   sd   m1    f1   m2    f2   m3    f3   m4    f4   m5  \
0  12:00       X  12.0  2.3  x11  fx11  x12  fx12  x13  fx13  x14  fx14  x15   
1  12:00       Y  17.0  2.7  y11  fy11  y12  fy12  y13  fy13  y14  fy14  y15   
2  12:00       Z  15.0  4.3  z11  fz11  z12  fz12  z13  fz13  z14  fz14  z15   
3  12:01       X  13.0  4.9  x21  fx21  NaN   NaN  NaN   NaN  NaN   NaN  NaN   

     f5  
0  fx15  
1  fy15  
2  fz15  
3   NaN  

Upvotes: 1

Related Questions