Reputation: 361
I have simplified the large data frame to this simple data frame:
IDX POS REF ALT
13 633 C A
15 643 C T
42 2015 G A
43 2016 G A
151 9538 T C
154 9542 TC TCC,T
169 10041 T A
170 10041 T TAA,TA
The data is from a genomic region with nucleotide position and the reference genome nucleotide and alternative nucleotides from different people for that same position. I have that some positions(9542 and 10041) have two different nucleotides alternatives.
I want to iterate through the ALT column and count the number of unique nucleotides to make a separate column with the counts. I haven't seen how this can be done using python pandas.
The new data frame will then look like this:
IDX POS REF ALT COUNT
13 633 C A 1
15 643 C T 1
42 2015 G A 1
43 2016 G A 1
151 9538 T C 1
154 9542 TC TCC,T 2
169 10041 T A 1
170 10041 T TAA,TA 2
How will it be possible to do this with Pandas (or just python)?
Thank you.
Rodrigo
Upvotes: 1
Views: 103
Reputation: 294258
I'd count
the commas and add 1
df['COUNT'] = df.ALT.str.count(',') + 1
Upvotes: 2