Reputation: 61
I have a very large dataframe.
I wanna groupby the column 'id' first.
Then create a new column 'reply_time' based on the other existing columns.
import pandas as pd
import numpy as np
id = ['793601486525702000','793601486525702000','793601710614802000','793601355214561000','793601355214561000','793601355214561000','793601355214561000','788130215436230000','788130215436230000','788130215436230000','788130215436230000','788130215436230000']
time = ['11/1/2016 16:53','11/1/2016 16:53','11/1/2016 16:52','11/1/2016 16:55','11/1/2016 16:53','11/1/2016 16:53','11/1/2016 16:51','11/1/2016 3:09','11/1/2016 3:04','11/1/2016 2:36','11/1/2016 2:08','11/1/2016 0:28']
reply = ['3','3','0','3','3','2','1','3','2','3','3','1']
df = pd.DataFrame({"id": id, "time": time, "reply": reply})
id time reply
793601486525702000 11/1/2016 16:53 3
793601486525702000 11/1/2016 16:53 3
793601710614802000 11/1/2016 16:52 0
793601355214561000 11/1/2016 16:55 3
793601355214561000 11/1/2016 16:53 3
793601355214561000 11/1/2016 16:53 2
793601355214561000 11/1/2016 16:51 1
788130215436230000 11/1/2016 3:09 3
788130215436230000 11/1/2016 3:04 2
788130215436230000 11/1/2016 2:36 3
788130215436230000 11/1/2016 2:08 3
788130215436230000 11/1/2016 0:28 1
There are two types of values in this new column 'reply_time'.
In this case, my output data frame will be:
id time reply reply_time
793601486525702000 11/1/2016 16:53 3 na
793601486525702000 11/1/2016 16:53 3 na
793601710614802000 11/1/2016 16:52 0 na
793601355214561000 11/1/2016 16:55 3 na
793601355214561000 11/1/2016 16:53 3 na
793601355214561000 11/1/2016 16:53 2 na
793601355214561000 11/1/2016 16:51 1 11/1/2016 16:53
788130215436230000 11/1/2016 3:09 3 na
788130215436230000 11/1/2016 3:04 2 na
788130215436230000 11/1/2016 2:36 3 na
788130215436230000 11/1/2016 2:08 3 na
788130215436230000 11/1/2016 0:28 1 11/1/2016 3:04
I haven't got any idea the best way to achieve this. Can anyone help?
Thanks in advance!
Upvotes: 0
Views: 132
Reputation: 323386
Try with merge
after slice and replace
yourdf=df.merge(df.query("reply=='2'").replace({'reply':{'2':'1'}}).rename(columns={'time':'reply_time'}),how='left')
Upvotes: 1