KLM117
KLM117

Reputation: 467

conditionally replacing values in a column

I have a pandas dataframe, where the 2nd, 3rd and 6th columns look like so:

start end strand
108286 108361 +
734546 734621 -
761233 761309 +

I'm trying to implement a conditional where, if strand is +, then the value in end becomes the equivalent value in start + 1, and if strand is -, then the value in start becomes the value in end, so the output should look like this:

start end strand
108286 108287 +
734620 734621 -
761233 761234 +

And where the pseudocode may look like this:

if df["strand"] == "+": 
        df["end"] = df["start"] + 1
        
else:
        df["start"] = df["end"] - 1

I imagine this might be best done with loc/iloc or numpy.where? but I can't seem to get it to work, as always, any help is appreciated!

Upvotes: 0

Views: 55

Answers (2)

user7864386
user7864386

Reputation:

You could also use numpy.where:

import numpy as np
df[['start', 'end']] = np.where(df[['strand']]=='-', df[['end','end']]-[1,0], df[['start','start']]+[0,1])

Note that this assumes strand can have one of two values: + or -. If it can have any other values, we can use numpy.select instead.

Output:

    start     end strand
0  108286  108287      +
1  734620  734621      -
2  761233  761234      +

Upvotes: 3

Arnau
Arnau

Reputation: 741

You are correct, loc is the operator you are looking for

df.loc[df.strand=='+','end'] = df.loc[df.strand=='+','start']+1
df.loc[df.strand=='-','start'] = df.loc[df.strand=='-','end']-1

Upvotes: 3

Related Questions