Renaming a subset of index from a dataframe

Question

I have a dataframe which looks like this

Geneid  PRKCZ.exon1 PRKCZ.exon2 PRKCZ.exon3 PRKCZ.exon4 PRKCZ.exon5 PRKCZ.exon6 PRKCZ.exon7 PRKCZ.exon8 PRKCZ.exon9 PRKCZ.exon10    ... FLNA.exon31 FLNA.exon32 FLNA.exon33 FLNA.exon34 FLNA.exon35 FLNA.exon36 FLNA.exon37 FLNA.exon38 MTCP1.exon1 MTCP1.exon2
S28 22  127 135 77  120 159 49  38  409 67  ... 112 104 37  83  47  18  110 70  167 19
22  3   630 178 259 142 640 77  121 521 452 ... 636 288 281 538 276 109 242 314 790 484
S04 16  658 320 337 315 881 188 162 769 577 ... 1291    420 369 859 507 208 554 408 1172    706
56  26  663 343 390 314 1090    263 200 844 592 ... 675 243 250 472 280 133 300 275 750 473
S27 13  1525    571 1081    560 1867    427 370 1348    1530    ... 1817    926 551 1554    808 224 971 1313    1293    701
5 rows × 8297 columns

In that above dataframe I need to add an extra column with information about the index. And so I made a list -healthy with all the index to be labelled as h and rest everything should be d.

And so tried the following lines:

healthy=['39','41','49','50','51','52','53','54','56']

H_type =pd.Series( ['h' for x in df.loc[healthy]  
                    else 'd' for x in df]).to_frame()

But it is throwing me following error:

SyntaxError: invalid syntax

Any help would be really appreciated

In the end I am aiming something like this:

Geneid  sampletype  SSX4.exon4  SSX2.exon11 DUX4.exon5  SSX2.exon3  SSX4.exon5  SSX2.exon10 SSX4.exon7  SSX2.exon9  SSX4.exon8  ... SETD2.exon21    FAT2.exon15 CASC5.exon8 FAT1.exon21 FAT3.exon9  MLL.exon31  NACA.exon7  RANBP2.exon20   APC.exon16  APOB.exon4
    S28 h   0   0   0   0   0   0   0   0   0   ... 2480    2003    2749    1760    2425    3330    4758    2508    4367    4094
    22  h   0   0   0   0   0   0   0   0   0   ... 8986    7200    10123   12422   14528   18393   9612    15325   8788    11584
    S04 h   0   0   0   0   0   0   0   0   0   ... 14518   16657   17500   15996   17367   17948   18037   19446   24179   28924
    56  h   0   0   0   0   0   0   0   0   0   ... 17784   17846   20811   17337   18135   19264   19336   22512   28318   32405
    S27 h   0   0   0   0   0   0   0   0   0   ... 10375   20403   11559   18895   18410   12754   21527   11603   16619   37679

Thank you

jezrael · Accepted Answer

I think you can use numpy.where with isin, if Geneid is column.

EDIT by comment:

There can be integers in column Geneid, so you can cast to string by astype.

healthy=['39','41','49','50','51','52','53','54','56']

df['type'] = np.where(df['Geneid'].astype(str).isin(healthy), 'h', 'd')

#get last column to list
print df.columns[-1].split()
['type']

#create new list from last column and all columns without last
cols = df.columns[-1].split() + df.columns[:-1].tolist()
print cols 
['type', 'Geneid', 'PRKCZ.exon1', 'PRKCZ.exon2', 'PRKCZ.exon3', 'PRKCZ.exon4', 
 'PRKCZ.exon5', 'PRKCZ.exon6', 'PRKCZ.exon7', 'PRKCZ.exon8', 'PRKCZ.exon9',
 'PRKCZ.exon10', 'FLNA.exon31', 'FLNA.exon32', 'FLNA.exon33', 'FLNA.exon34',
 'FLNA.exon35', 'FLNA.exon36', 'FLNA.exon37', 'FLNA.exon38', 'MTCP1.exon1', 'MTCP1.exon2']

#reorder columns
print df[cols]
  type Geneid  PRKCZ.exon1  PRKCZ.exon2  PRKCZ.exon3  PRKCZ.exon4  \
0    d    S28           22          127          135           77   
1    d     22            3          630          178          259   
2    d    S04           16          658          320          337   
3    h     56           26          663          343          390   
4    d    S27           13         1525          571         1081   

   PRKCZ.exon5  PRKCZ.exon6  PRKCZ.exon7  PRKCZ.exon8     ...       \
0          120          159           49           38     ...        
1          142          640           77          121     ...        
2          315          881          188          162     ...        
3          314         1090          263          200     ...        
4          560         1867          427          370     ...        

   FLNA.exon31  FLNA.exon32  FLNA.exon33  FLNA.exon34  FLNA.exon35  \
0          112          104           37           83           47   
1          636          288          281          538          276   
2         1291          420          369          859          507   
3          675          243          250          472          280   
4         1817          926          551         1554          808   

   FLNA.exon36  FLNA.exon37  FLNA.exon38  MTCP1.exon1  MTCP1.exon2  
0           18          110           70          167           19  
1          109          242          314          790          484  
2          208          554          408         1172          706  
3          133          300          275          750          473  
4          224          971         1313         1293          701  

[5 rows x 22 columns]

Renaming a subset of index from a dataframe

Answers (2)

Related Questions