Reputation: 315
I have the following Pandas dataframe:
chr POS RS REF ALT
1 chr1 981931 rs2465128 A GA
2 chr1 982994 rs10267 T C
3 chr1 984302 rs9442391 T C
4 chr1 987200 rs9803031 C T
5 chr1 990280 rs4275402 CT C
I would like to have another column that has the value "SNP" if the length of both "REF" and "ALT" columns is equal to 1, and the value "INDEL" if any of them is different from 1, so the output should look like this:
chr POS RS REF ALT TYPE
1 chr1 981931 rs2465128 A GA INDEL
2 chr1 982994 rs10267 T C SNP
3 chr1 984302 rs9442391 T C SNP
4 chr1 987200 rs9803031 C T SNP
5 chr1 990280 rs4275402 CT C INDEL
I have written some code and it does work but it is very slow, I was wondering if there is a more efficient way to do this through comprehension lists or lambda functions.
My code
for index, row in table.iterrows():
if len(row['REF']) == 1 and len(row['ALT']) == 1 :
table.loc[ index, "TYPE" ] = "SNP"
else :
table.loc[ index, "TYPE" ] = "INDEL"
Thanks a lot
Rachael
Upvotes: 1
Views: 60
Reputation: 862406
Use Series.str.len
for lengths and set new column by numpy.where
:
m = (table['REF'].str.len() == 1) & (table['ALT'].str.len() == 1)
table["TYPE"] = np.where(m, "SNP", "INDEL")
print (table)
chr POS RS REF ALT TYPE
1 chr1 981931 rs2465128 A GA INDEL
2 chr1 982994 rs10267 T C SNP
3 chr1 984302 rs9442391 T C SNP
4 chr1 987200 rs9803031 C T SNP
5 chr1 990280 rs4275402 CT C INDEL
Upvotes: 4