Reputation: 181
I have a dataframe with columns named 'id', 'x', 'y', and 'time'
id | time | x | y |
---|---|---|---|
1 | 0 | 14 | 12 |
1 | 1 | 32 | 23 |
1 | 2 | 52 | 14 |
2 | 2 | 12 | 34 |
3 | 0 | 62 | 17 |
3 | 1 | 82 | 35 |
3 | 2 | 22 | 25 |
I want to add two columns to the dataframe so that they have the value of x and y from another row having the same id and a time + 2
the result should like this:
id | time | x | y | x2 | y2 |
---|---|---|---|---|---|
1 | 0 | 14 | 12 | 52 | 14 |
1 | 1 | 32 | 23 | ||
1 | 2 | 52 | 14 | ||
2 | 2 | 12 | 34 | ||
3 | 0 | 62 | 17 | 22 | 25 |
3 | 1 | 82 | 35 | ||
3 | 2 | 22 | 25 |
please note that the dataframe is not sorted by id
I have tried the following for x2 but it is not working as intended:
t=2
data['x2'] = data.apply(lambda x: x['x'] if (data[(data['id']==x['id']) & ((data['time']+t) == x['time'])].size > 0) else '', axis=1)
The following works but I need to use a shortcut way and the one with the best performance because my data is huge
t=2
for index, row in data.iterrows():
rowT = data[(data['id']==row['id']) & (data['time'] == (row['time'] + t))]
if rowT.size > 0:
data.loc[index,'x2'] = rowT['x'].values[0]
Upvotes: 3
Views: 188
Reputation: 4233
look up time +2 within each id
id=[1,1,1,2,3,3,3]
time=[0,1,2,2,0,1,2]
x=[14,32,52,12,62,82,22]
y=[12,23,14,34,17,35,25]
df=pd.DataFrame({'id':id,'time':time,'x':x,'y':y})
df.reset_index()
df['x2']=0
df['y2']=0
for key,item in df.iterrows():
lookup=(item['time']+2)
filter=(df['time']==lookup) & (df['id']==item['id'])
results=df[filter]
if len(results)>0:
row=results.iloc[0]
x2=row.x
y2=row.y
df.loc[key,['x2','y2']]=(x2,y2)
print(df)
output:
id time x y x2 y2
0 1 0 14 12 52 14
1 1 1 32 23 0 0
2 1 2 52 14 0 0
3 2 2 12 34 0 0
4 3 0 62 17 22 25
5 3 1 82 35 0 0
6 3 2 22 25 0 0
#no looping
df2=df.copy()
df2['time'] = df2.apply(lambda x: x['time']+2, axis=1)
results=df2[['id','time','x','y']].merge(df[['id','time','x','y']]
,on=['id','time'],how="left",suffixes=('', '2')).fillna(0)
print(results)
Upvotes: 0
Reputation: 71689
You can create a new dataframe by repopulating the values in time
column with the values at t-2
seconds, then left merge
this new dataframe with the original dataframe on the columns id, time
to get the result:
df_r = df.assign(time=df['time'].sub(2))
df.merge(df_r, on=['id', 'time'], how='left', suffixes=['', '2'])
id time x y x2 y2
0 1 0 14 12 52.0 14.0
1 1 1 32 23 NaN NaN
2 1 2 52 14 NaN NaN
3 2 2 12 34 NaN NaN
4 3 0 62 17 22.0 25.0
5 3 1 82 35 NaN NaN
6 3 2 22 25 NaN NaN
Upvotes: 2