Reputation: 5239
Creating a scenario:
Assuming a dataframe with two series, where A
is the input and B
is the result of A[index]*2
:
df = pd.DataFrame({'A': [1, 2, 3],
'B': [2, 4, 6]})
Lets say I am receiving a 100k row dataframe and searching for errors in it (here B->0
is invalid):
df = pd.DataFrame({'A': [1, 2, 3],
'B': [2, 0, 6]})
Searching the invalid rows by using
invalid_rows = df.loc[df['A']*2 != df['B']]
I have the invalid_rows
now, but I am not sure what would be the fastest way to overwrite the invalid rows in the original df
with the result of A[index]*2
?
Iterating over the df
using iterrows()
is an option but slow if the df
grows. Can I use df.update()
for this somehow?
Working solution with a loop:
index = -1
for row_index, my_series in df.iterrows():
if myseries['A']*2 != myseries['B']:
df[index]['B'] = myseries['A']*2
But is there a faster way to do this?
Upvotes: 0
Views: 562