Reputation: 685
I have a dataframe that looks like this
rootID parentID jobID time counter
0 A B D 2019-01-30 14:33:21.339469 0
1 E F G 2019-01-30 14:33:21.812381 0
2 A C D 2019-01-30 15:33:21.812381 0
3 E B F 2019-01-30 15:33:21.812381 0
4 E F G 2019-01-30 16:33:21.812381 0
I will be fed live data, and the data is a rootID string, parentID string, jobID string and a date.
I want to check if the newly retrieved rootID and parentID combination already exists in the dataframe. So if I retrieve rootID = "A" and parentID == "B", jobID =="T" I want to access the first row of the dataframe. I then want to delete that row and append the new information and increment the update counter.
IF rootID and parentID exist in the same row in dataframe.
delete row and append new data with updated jobID and incremented counter.
The dataframe should now look like
rootID parentID jobID time counter
0 E F G 2019-01-30 14:33:21.812381 0
1 A C D 2019-01-30 15:33:21.812381 0
2 E B F 2019-01-30 15:33:21.812381 0
3 E F G 2019-01-30 16:33:21.812381 0
4 A B T 2019-01-30 17:33:21.339469 1
Anyone have any idea how this could be done?
Upvotes: 0
Views: 459
Reputation: 59284
I would
root_id, parent_id, job_id = get_ids() # e.g. ('A', 'B', 'T')
cond = df.rootID.eq(root_id) & df.parentID.eq(parent_id) & df.jobID.eq(job_id)
df.loc[cond, ['time', 'counter']] = datetime.datetime.now(), df.loc[cond, 'counter']+1
This would update your data frame in_place
. I wouldn't reorder the Dataframe all the time, unless it is absolutely needed. If you can, for example, re-sort it once a day, you can just
df.sort_values(by='time') #.reset_index(drop=True), if needed
periodically. However, if you absolutely have to change the rows at each time new data comes, then, assuming you have unique ids,
df = pd.concat([df[df[cond].index.item()+1:], df[cond]])
Upvotes: 1
Reputation: 340
I think you can get a pretty close solution just by translating your example :
for index, row in df.iterrows():
if row['rootID'] == rootID and row['parentID'] == parentID:
df.drop(df.iloc[index])
row['time'] = datetime.datetime.now()
row['counter'] += 1
df = df.concat(row)
Without loops:
selected_row = df[(df['rootId'] == rootID) & (df['parentID'] == parentID)])
df.drop(selected_row)
selected_row['time'] = datetime.datetime.now()
selected_row['counter'] += 1
df = df.concat(selected_row)
This assume you have only one row matching the rootID and parentID combination you are searching for.
Upvotes: 0