Efficient replace value by value in other column

Question

I am trying to replace one column by another if the values in that column are equal to a string. The value of this string is "wo". If this shows up in column y, replace by column x. Currently I use the following code:

df.y.replace("wo",df.x)

This runs for a very long time (millions of observations, equals days of calculations).

Is there a more efficient way to replace ?

Just in case, the data looks as follows:

 y    x    other variables
 1    mo    something
 2    2     something
 3    3     something
 wo   >5    something
 4    4     something
 wo   7     something

It has to look like:

 y    x    other variables
 1    mo    something
 2    2     something
 3    3     something
 >5   >5   something
 4    4     something
 7    7     something

MaxU - stand with Ukraine · Accepted Answer

try this:

df.loc[(df.y == 'wo'), 'y'] = df.x

it will first filter only those rows where df.y == 'wo' and will assign x column's value to 'y' column

Timeit report:

In [304]: %timeit df.y.replace("wo",df.x)
100 loops, best of 3: 13.9 ms per loop

In [305]: %timeit df.loc[(df.y == 'wo'), 'y'] = df.x
100 loops, best of 3: 3.31 ms per loop

In [306]: %timeit df.ix[(df.y == 'wo'), 'y'] = df.x
100 loops, best of 3: 3.31 ms per loop

UPDATE: starting from Pandas 0.20.1 the .ix indexer is deprecated, in favor of the more strict .iloc and .loc indexers.

Efficient replace value by value in other column

Answers (2)

Related Questions