Reputation: 4934
Given DataFrame df
:
Id Sex Group Time Time!
0 21 M 2 2.31 NaN
1 2 F 2 2.29 NaN
and update
:
Id Sex Group Time
0 21 M 2 2.36
1 2 F 2 2.09
2 3 F 1 1.79
I want to match on Id
, Sex
and Group
and either update Time!
with Time
value (from the update
df) if match, or insert if a new record.
Here is how I do it:
df = df.set_index(['Id', 'Sex', 'Group'])
update = update.set_index(['Id', 'Sex', 'Group'])
for i, row in update.iterrows():
if i in df.index: # update
df.ix[i, 'Time!'] = row['Time']
else: # insert new record
cols = up.columns.values
row = np.array(row).reshape(1, len(row))
_ = pd.DataFrame(row, index=[i], columns=cols)
df = df.append(_)
print df
Time Time!
Id Sex Group
21 M 2 2.31 2.36
2 F 2 2.29 2.09
3 F 1 1.79 NaN
The code seem to work and my wished result matches with the above. However, I have noticed this behaving faultily on a big data set, with the conditional
if i in df.index:
...
else:
...
working obviously wrong (it would proceed to else
and vice-verse where it shouldn't, I guess, this MultiIndex may be the cause somehow).
So my question is, do you know any other way, or a more robust version of mine, to update one df based on another df?
Upvotes: 0
Views: 6981
Reputation: 375855
I think I would do this with a merge, and then update the columns with a where. First remove the Time column from up:
In [11]: times = up.pop('Time') # up = the update DataFrame
In [12]: df1 = df.merge(up, how='outer')
In [13]: df1
Out[13]:
Id Sex Group Time Time!
0 21 M 2 2.31 NaN
1 2 F 2 2.29 NaN
2 3 F 1 NaN NaN
Update Time if it's not NaN and Time! if it's NaN:
In [14]: df1['Time!'] = df1['Time'].where(df1['Time'].isnull(), times)
In [15]: df1['Time'] = df1['Time'].where(df1['Time'].notnull(), times)
In [16]: df1
Out[16]:
Id Sex Group Time Time!
0 21 M 2 2.31 2.36
1 2 F 2 2.29 2.09
2 3 F 1 1.79 NaN
Upvotes: 4