Peter
Peter

Reputation: 441

Efficient replace value by value in other column

I am trying to replace one column by another if the values in that column are equal to a string. The value of this string is "wo". If this shows up in column y, replace by column x. Currently I use the following code:

df.y.replace("wo",df.x) 

This runs for a very long time (millions of observations, equals days of calculations).

Is there a more efficient way to replace ?

Just in case, the data looks as follows:

 y    x    other variables
 1    mo    something
 2    2     something
 3    3     something
 wo   >5    something
 4    4     something
 wo   7     something

It has to look like:

 y    x    other variables
 1    mo    something
 2    2     something
 3    3     something
 >5   >5   something
 4    4     something
 7    7     something

Upvotes: 2

Views: 148

Answers (2)

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210862

try this:

df.loc[(df.y == 'wo'), 'y'] = df.x

it will first filter only those rows where df.y == 'wo' and will assign x column's value to 'y' column

Timeit report:

In [304]: %timeit df.y.replace("wo",df.x)
100 loops, best of 3: 13.9 ms per loop

In [305]: %timeit df.loc[(df.y == 'wo'), 'y'] = df.x
100 loops, best of 3: 3.31 ms per loop

In [306]: %timeit df.ix[(df.y == 'wo'), 'y'] = df.x
100 loops, best of 3: 3.31 ms per loop

UPDATE: starting from Pandas 0.20.1 the .ix indexer is deprecated, in favor of the more strict .iloc and .loc indexers.

Upvotes: 4

user25064
user25064

Reputation: 2120

First, Pandas should be notified that that string value "wo" represents IEEE double NaN (aka numpy nan etc). See for example the na_values parameter of the read_csv method here. This will allow the entire column to be stored as double which will increase efficiency. Then use something like this to replace the NaN values with the values from the other column.

Upvotes: 2

Related Questions