Reputation: 835
I am trying to create a code that as fast as possible can update a large number of pandas values at the same time based on the values of an ID column. The code I am using atm insert multiple rows at the same time base on an ID_column, but cannot insert multiple columns + rows at the same time.
The below code is a simplied example which updates 2 columns (value1 and value2) based on the value of column name ID.
id_list = [24,26,28]
value1_list =[100,200,300]
value2_list =[125,175,165]
df.loc[df['id'].isin(id_list), 'value1'] = value1_list
df.loc[df['id'].isin(id_list), 'value2'] = value2_list
The code I ideally would use would work something like this:
df.loc[df['id'].isin(id_list), ['value1','value2']] = [value1_list,value2_list]
Meaning both column value1 and column value2 would be updated at the same time Is this doable?
Upvotes: 1
Views: 1682
Reputation: 164623
You can construct a single NumPy array via np.vstack
and then assign to multiple columns:
import numpy as np
mask = df['id'].isin(id_list)
value_array = np.vstack((value1_list, value2_list)).T
df.loc[mask, ['value1', 'value2']] = value_array
print(df)
# id value1 value2
# 0 24 100 125
# 1 26 200 175
# 2 28 300 165
# 3 30 4 8
Alternatively, you can zip
your input lists:
df.loc[mask, ['value1', 'value2']] = list(zip(*(value1_list,value2_list)))
Upvotes: 1
Reputation: 2472
You can also use dictionary to enter columns corresponding to their updates.
mask = df['id'].isin(id_list)
dc = {'value1': value1_list, 'value2': value2_list}
for i, j in dc.items():
df.loc[mask, i] = j
Upvotes: 0