Reputation: 6220
I want to update some cells in a row for a pandas DataFrame
, I am using update
to do it, but it always modify the first index only, this is a example:
df = pd.DataFrame(data={'cod':[1000,1001], 'B': ['b1','b2'], 'C':['c1','c2']})
updated_data = pd.DataFrame({'cod':[1001], 'C':['newC1']})
updated_data2 = pd.DataFrame({'cod':[1000], 'B':['newB1']})
df.update(updated_data)
df.update(updated_data2)
After this code, df will have:
cod B C
0 1000.0 newB1 newC1
1 1001.0 b2 c2
When it should be
cod B C
cod
1000 1000.0 newB1 c1
1001 1001.0 b2 newC1
In order to achieve, I wrote the following code, but do not know if its the best approach:
df = pd.DataFrame(data={'cod':[1000,1001], 'B': ['b1','b2'], 'C':['c1','c2']})
df = df.set_index(df.cod)
updated_data = pd.DataFrame({'cod':[1001], 'C':['newC1']})
updated_data = updated_data.set_index(updated_data.cod)
df.update(updated_data, overwrite=True)
updated_data = pd.DataFrame({'cod':[1000], 'B':['newB1']})
updated_data = updated_data.set_index(updated_data.cod)
df.update(updated_data, overwrite=True)
It seems to me its very verbose for something simple, is there another approach?
This is the actual code, instead of having two updated_data
, in reality is within a loop:
df = pd.DataFrame(data={'cod':[1000,1001], 'B': ['b1','b2'], 'C':['c1','c2']})
df = df.set_index(df.cod)
for i in (1000,1001):
updated_data = pd.DataFrame({'cod':[i], 'C':['newC1']})
updated_data = updated_data.set_index(updated_data.cod)
df.update(updated_data, overwrite=True)
Upvotes: 5
Views: 1939
Reputation: 639
In your case you can simply use:
df.loc[df.cod == 1001, 'C'] = 'newC1'
df.loc[df.cod == 1000, 'B'] = 'newB1'
To make it faster, it's better to set index:
df = df.set_index(df.cod)
df.loc[df.index == 1001, 'C'] = 'newC1'
df.loc[df.index == 1000, 'B'] = 'newB1'
You can use list of columns:
df.loc[df.index == 1001, ['C', 'B']] = ['newC', 'newB']
Upvotes: 4