Reputation: 23
Given a data frame df:
Column A: [0, 1, 3, 4, 6]
Column B: [0, 0, 0, 0, 0]
The goal is to conditionally replace values in column B. If column A's values exist in a set assginedToA
, we replace the corresponding values in column B with a constant b
.
For example: if b=1 and assignedToA={1,4}, the result would be
Column A: [0, 1, 3, 4, 6]
Column B: [0, 1, 0, 1, 0]
My code for finding the A values and write B values into it looks like this:
df.loc[df['A'].isin(assignedToA),'B']=b
This code works, but it is really slow for a huge dataframe. Do you have any advice, how to speed this process up?
The dataframe df has around 5 Million rows and assignedToA
has a maximum of 7 values.
Upvotes: 2
Views: 435
Reputation: 164693
You may find a performance improvement by dropping down to numpy
:
df = pd.DataFrame({'A': [0, 1, 3, 4, 6],
'B': [0, 0, 0, 0, 0]})
def jp(df, vals, k):
B = df['B'].values
B[np.in1d(df['A'], list(vals))] = k
df['B'] = B
return df
def original(df, vals, k):
df.loc[df['A'].isin(vals),'B'] = k
return df
df = pd.concat([df]*100000)
%timeit jp(df, {1, 4}, 1) # 8.55ms
%timeit original(df, {1, 4}, 1) # 16.6ms
Upvotes: 2