Joe
Joe

Reputation: 309

Give values from one column to another column in pandas dataframe based on conditions

I have a dataframe:

df = pd.DataFrame({'A': [0, 1, 2, 3, 4],
                   'B': [5, '-', '-', 8, 9],
                   'C': ['-', 'b', 'c', '-', 'e']})

How can I replace the values in df['B'] using corresponding values in df['C'], if df['B']='-' and df['C']!= '-'.

Expected output:

({'A': [0, 1, 2, 3, 4],
  'B': [5, 'b', 'c', 8, 9],
  'C': ['-', 'b', 'c', '-', 'e']})

Out

I used:

replace = (df['B'] == '-') & (df['C'] != '-')
df['B'][replace1] = df['C']

Is there any better way?

Upvotes: 1

Views: 146

Answers (3)

jezrael
jezrael

Reputation: 862481

You are close, use DataFrame.loc:

replace = (df['B'] == '-') & (df['C'] != '-')
df.loc[replace, 'B'] = df['C']
print (df)
   A  B  C
0  0  5  -
1  1  b  b
2  2  c  c
3  3  8  -
4  4  9  e

I was curious if np.where is faster here and with sample data repeated 100000 times not:

In real data is should be different, depends of length of DataFrame and number of matched values.

df = pd.DataFrame({'A': [0, 1, 2, 3, 4],
                   'B': [5, '-', '-', 8, 9],
                   'C': ['-', 'b', 'c', '-', 'e']})

#[500000 rows x 3 columns]
df = pd.concat([df] * 100000, ignore_index=True)

In [9]: %timeit df.loc[(df['B'] == '-') & (df['C'] != '-'), 'B'] = df['C']
60.7 ms ± 643 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

In [10]: %timeit df['B']=np.where((df['B']=='-')&(df['C']!='-'),df['C'],df['B'])
66 ms ± 324 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

I think reason is np.where processing all values, loc only filtered values. Also there are mixed strings with numbers.

Upvotes: 1

Venkataraman R
Venkataraman R

Reputation: 12959

You can use indexing to update the values, like given below:

import pandas as pd
df = pd.DataFrame({'A': [0, 1, 2, 3, 4],
                   'B': [5, '-', '-', 8, 9],
                   'C': ['-', 'b', 'c', '-', 'e']})

for index, row in df.iterrows():
    if(row['B'] == '-' and row['C']!='-'):
       df.loc[index,'B'] = df.loc[index,'C']

enter image description here

Upvotes: 1

Subasri sridhar
Subasri sridhar

Reputation: 831

Try this :

import numpy as np 
df['B']=np.where((df['B']=='-')&(df['C']!='-'),df['C'],df['B'])

Dataframe looks like :

enter image description here

Upvotes: 5

Related Questions