Reputation: 571
I have a table:
genome start end strand etc
GUT_GENOME270877.fasta 98 396 +
GUT_GENOME270877.fasta 384 574 -
GUT_GENOME270877.fasta 593 984 +
GUT_GENOME270877.fasta 991 999 -
I'd like to make a new table with column coordinates
, which joins start
and end
columns and looking like this:
genome start end strand etc coordinates
GUT_GENOME270877.fasta 98 396 + 98..396
GUT_GENOME270877.fasta 384 574 - complement(384..574)
GUT_GENOME270877.fasta 593 984 + 593..984
GUT_GENOME270877.fasta 991 999 - complement(991..999)
so that if there's a -
in the etc
column, I'd like to do not just
df['coordinates'] = df['start'].astype(str) + '..' + df['end'].astype(str)
but to add brackets and complement, like this:
df['coordinates'] = 'complement(' + df['start'].astype(str) + '..' + df['end'].astype(str) + ')'
The only things i'm missing is how to introduce the condition.
Upvotes: 1
Views: 79
Reputation: 260975
You can use numpy.where
:
m = df['strand'].eq('-')
df['coordinates'] = (np.where(m, 'complement(', '')
+df['start'].astype(str)+'..'+df['end'].astype(str)
+np.where(m, ')', '')
)
Or boolean indexing:
m = df['strand'].eq('-')
df['coordinates'] = df['start'].astype(str)+'..'+df['end'].astype(str)
df.loc[m, 'coordinates'] = 'complement('+df.loc[m, 'coordinates']+')'
Output:
genome start end strand coordinates
0 GUT_GENOME270877.fasta 98 396 + 98..396
1 GUT_GENOME270877.fasta 384 574 - complement(384..574)
2 GUT_GENOME270877.fasta 593 984 + 593..984
3 GUT_GENOME270877.fasta 991 999 - complement(991..999)
Upvotes: 2