ARJ
ARJ

Reputation: 2080

Append a string as suffix to columns in data frame as the values from a column of other dataframe

I have a data frame as following,

data0 = pd.DataFrame({'PE03': ['CTA15', 'CTA15', 'AC007', 'AC007', 'AC007'], 
'AL01': [22, 22, 2, 2, 2], 
'AL02':['12', '13', '14', '15', '16'], 
'AL03' : [2, 2, 3, 3, 3],
 'PL03' :[3,7,8,9,1]})

data0

    AL01  AL02  AL03  PE03   PL03
0   22    12    2     CTA15  3
1   22    13    2     CTA15  7
2   2     14    3     AC007  8
3   2     15    3     AC007  9
4   2     16    3     AC007  1

And another data farme as,

data1 = pd.DataFrame({'main': ['CTA15', 'CTA15', 'AC007', 'AC007', 'AC007'], 
'sub': ['LUD1', 'KLM1', 'SAP1', 'SAP2', 'SOS'], 
'samples':['PE03', 'AL01', 'AL02', 'AL03', 'PL03']})

data1

   main   samples  sub
0  CTA15  PE03     LUD1
1  CTA15  AL01     KLM1
2  AC007  AL02     SAP1
3  AC007  AL03     SAP2
4  AC007  PL03     SOS

And what I wanted is to look for the matching columns from data0 into data1 in column samples and then append the values from sub to columns in data0.

The final output should look like this,

data_2

   AL01_KLM1  AL02_SAP1  AL03_SAP2  PE03_LUD1  PL03_SOS
0  22         12         2          CTA15      3
1  22         13         2          CTA15      7
2  2          14         3          AC007      8
3  2          15         3          AC007      9
4  2          16         3          AC007      1

Any help or suggestions would be great.

Upvotes: 0

Views: 133

Answers (2)

Scott Boston
Scott Boston

Reputation: 153510

You can merge the tables to create the concatenated column names and reshape:

df1 = data0.T.merge(data1, left_index=True, right_on='samples')
df1 = df1.assign(colname=df1['samples']+'_'+df1['sub'])
data_2 = df1.set_index('colname').T.drop(data1.columns).rename_axis(None,1))

Output:

  AL01_KLM1 AL02_SAP1 AL03_SAP2 PE03_LUD1 PL03_SOS
0        22        12         2     CTA15        3
1        22        13         2     CTA15        7
2         2        14         3     AC007        8
3         2        15         3     AC007        9
4         2        16         3     AC007        1

Upvotes: 1

Ciarán Tobin
Ciarán Tobin

Reputation: 7536

I think this is what you want:

 sub = data1.set_index('samples').loc[data0.columns]['sub']
 data0.columns = [x + '_' + y for x, y in zip(data0.columns, sub)]

It just finds the matching sub value for each sample and appends it to the column name in data0.

Upvotes: 1

Related Questions