Reputation: 263
so far my dataframe looks like this:
ID Area Stage
1 P X
2 Q X
3 P X
4 Q Y
I would like to replace the area 'Q' with 'P' for every row where the Stage is equal to 'X'.
So the result should look like:
ID Area Stage
1 P X
2 P X
3 P X
4 Q Y
I tried:
data.query('Stage in ["X"]')['Area']=data.query('Stage in ["X"]')['Area'].replace('Q','P')
It does not work. Help is appreciated! :)
Upvotes: 8
Views: 3785
Reputation: 32596
Note : this not an answer proposing a new way to do, but a comparison of the execution time each needs
All the proposals in the answers are quite 'magic' doing the job in one line of code thanks to pandas/numpy, anyway to do the job is good but to do it quickly is better, so I wanted to compare the execution time of each.
Here my program, in the loops I modify the dataframe two times to let it unchanged from a turn to the next ( I am not a Python programmer as you so sorry in advance if the way to do is 'poor') :
import pandas as pd
import numpy as np
import time
df=pd.DataFrame({'ID' : [i for i in range(1,1000)],
'Area' : ['P' if (i & 1) else 'Q' for i in range(1,1000)],
'Stage' : [ 'X' if (i & 2) else 'Y' for i in range(1,1000)]})
t0=time.process_time()
for i in range(1,100):
df.loc[df['Stage']=='X', 'Area'] = df['Area'].replace('Q','q')
df.loc[df['Stage']=='X', 'Area'] = df['Area'].replace('q','Q')
print("Quang Hoang", '%.2f' % (time.process_time() - t0))
t0=time.process_time()
for i in range(1,100):
df.loc[df['Stage'] == 'X', 'Area'] = 'q'
df.loc[df['Stage'] == 'X', 'Area'] = 'Q'
print("Joe Ferndz", '%.2f' % (time.process_time() - t0))
t0=time.process_time()
for i in range(1,100):
df.loc[df['Area'].eq("Q") & df['Stage'].eq('X'),'Area']='q'
df.loc[df['Area'].eq("q") & df['Stage'].eq('X'),'Area']='Q'
print("anky 1", '%.2f' % (time.process_time() - t0))
t0=time.process_time()
for i in range(1,100):
df['Area'] = np.where(df['Area'].eq("Q") & df['Stage'].eq('X'),'q',df['Area'])
df['Area'] = np.where(df['Area'].eq("q") & df['Stage'].eq('X'),'Q',df['Area'])
print("anky 2", '%.2f' % (time.process_time() - t0))
t0=time.process_time()
for i in range(1,100):
df['Area']=np.where(df['Stage']=='X','q',df['Area'])
df['Area']=np.where(df['Stage']=='X','Q',df['Area'])
print("RavinderSingh13", '%.2f' % (time.process_time() - t0))
On my PI 4 the result is :
Quang Hoang 1.60
Joe Ferndz 1.12
anky 1 1.55
anky 2 0.86
RavinderSingh13 0.38
if I use a dataframe having 100000 lines rather than 1000 the result is :
Quang Hoang 10.79
Joe Ferndz 6.61
anky 1 10.91
anky 2 9.64
RavinderSingh13 4.75
Note the proposals of Joe Ferndz and RavinderSingh13 suppose Area is only 'P' or 'Q'
Upvotes: 3
Reputation: 8508
To update a column using value from another column, use this option:
df.loc[df['Stage'] == 'X', 'Area'] = 'P'
This will check if value of 'Stage'
is X
. If True, then it will replace value of 'Area'
to 'P'
Upvotes: 1
Reputation: 133750
Could you please try following.
import pandas as pd
import numpy as np
df['Area']=np.where(df['Stage']=='X','P',df['Area'])
Upvotes: 4
Reputation: 150805
You can use loc
to specify where you want to replace, and pass the replaced series to the assignment:
df.loc[df['Stage']=='X', 'Area'] = df['Area'].replace('Q','P')
Output:
ID Area Stage
0 1 P X
1 2 P X
2 3 P X
3 4 Q Y
Upvotes: 3
Reputation: 75130
you can use 2 boolean conditions and use loc
:
df.loc[df['Area'].eq("Q") & df['Stage'].eq('X'),'Area']='P'
print(df)
ID Area Stage
0 1 P X
1 2 P X
2 3 P X
3 4 Q Y
Or np.where
df['Area'] = np.where(df['Area'].eq("Q") & df['Stage'].eq('X'),'P',df['Area'])
Upvotes: 5