Win Yen Tham
Win Yen Tham

Reputation: 55

How to change texts in columns and then combine 1st row with the column headers?

I have the following dataframe:

data={'a1':['X1',2,3,4,5],'Unnamed: 02':['Y1',5,6,7,8],'b1':['X2',5,3,7,9],'Unnamed: 05':['Y2',5,8,9,3],'c1':['X3',4,5,7,5],'Unnamed: 07':['Y3',5,8,9,3],'d1':['P',2,4,5,7],'Unnamed: 09':['M',8,4,6,7]}
df=pd.DataFrame(data)
df.columns=df.columns.to_series().mask(lambda x: x.str.startswith('Unnamed')).ffill()
df

enter image description here

There are a few things which I would like to do:

  1. Change the rows containing (X1, X2 & X3) into just 'X (vice versa for Y1,Y2,Y3 into 'Y)
  2. Combine the existing column header with the row containing X,Y,P,M

The outcome should look like:

  1. Change the rows containing (X1, X2 & X3) into just 'X (vice versa for Y1,Y2,Y3 into 'Y)

enter image description here

  1. Combine the existing column header with the row containing X,Y,P,M - Also note that the 'P' and 'M' completely replaces 'd1' respectively.

enter image description here

Upvotes: 1

Views: 82

Answers (3)

cottontail
cottontail

Reputation: 23331

You can append the letters in the first row to the column labels conditionally. Then slice the dataframe to remove the first row and reset its index.

import numpy as np
# extract only the letters from first row
first_row = df.iloc[0].str.extract('([A-Z]+)')[0]
# update column names by first_row
# the columns with P and M in it have their names completely replaced
df.columns = np.where(first_row.isin(['P', 'M']), first_row, df.columns + '_' + first_row.values)
# remove first row
df = df.iloc[1:].reset_index(drop=True)
df

enter image description here

Upvotes: 1

BeRT2me
BeRT2me

Reputation: 13242

df.columns = [x + '_' + y[0] if len(y)>1 else y for x, y in df.iloc[0].reset_index().values]
df = df[1:].reset_index(drop=True)
print(df)

Output:

  a1_X a1_Y b1_X b1_Y c1_X c1_Y  P  M
0    2    5    5    5    4    5  2  8
1    3    6    3    8    5    8  4  4
2    4    7    7    9    7    9  5  6
3    5    8    9    3    5    3  7  7

Upvotes: 1

ko3
ko3

Reputation: 1811

Alternatively, you can do something like this:

# Transpose data frame and make index to column
df = df.T.reset_index()
# Assign new column, use length of first row as condition
df["column"] = np.where(df[0].str.len() > 1, df["index"].str[:] + "_" + df[0].str[0], df[0].str[0])
df.drop(columns=["index", 0]).set_index("column").T.rename_axis(None, axis=1)

----------------------------------------------------------
    a1_X    a1_Y    b1_X    b1_Y    c1_X    c1_Y    P   M
1   2       5       5       5       4       5       2   8
2   3       6       3       8       5       8       4   4
3   4       7       7       9       7       9       5   6
4   5       8       9       3       5       3       7   7

----------------------------------------------------------

It's a more general solution as it uses the length of each row-zero entry as a condition, not the actual values 'P' and 'M'. Thus, it holds for each single character string.

Upvotes: 1

Related Questions