Reputation: 759
I have one dataframe (df1) like the following:
ATime ETime Difference
0 1444911017815 1588510 1444909429305
1 1444911144979 1715672 1444909429307
2 1444911285683 1856374 1444909429309
3 1444911432742 2003430 1444909429312
4 1444911677101 2247786 1444909429315
5 1444912444821 3015493 1444909429328
6 1444913394542 3965199 1444909429343
7 1444913844134 4414784 1444909429350
8 1444914948835 5519467 1444909429368
9 1444915840638 6411255 1444909429383
10 1444916566634 7137240 1444909429394
11 1444917379593 7950186 1444909429407
I have another very big dataframe (df2) which has a column named Absolute_Time. Absolute_Time has the format as ATime of df1. So what I want to do is, for example, for all Absolute_Time's that lay in the range of row 0 to row 1 of ETime of df1, I want to subtract row 0 of Difference of df1 and so on.
Upvotes: 0
Views: 96
Reputation: 42885
Here's an attempt to accomplish what you might be looking for, starting with:
print(df1)
ATime ETime Difference
0 1444911017815 1588510 1444909429305
1 1444911144979 1715672 1444909429307
2 1444911285683 1856374 1444909429309
3 1444911432742 2003430 1444909429312
4 1444911677101 2247786 1444909429315
5 1444912444821 3015493 1444909429328
6 1444913394542 3965199 1444909429343
7 1444913844134 4414784 1444909429350
8 1444914948835 5519467 1444909429368
9 1444915840638 6411255 1444909429383
10 1444916566634 7137240 1444909429394
11 1444917379593 7950186 1444909429407
next creating a new DataFrame
with random times within the range of df1
:
df2 = pd.DataFrame({'Absolute Time':[randrange(start=df1.ATime.iloc[0], stop=df1.ATime.iloc[-1]) for i in range(100)]})
df2 = df2.sort_values('Absolute Time').reset_index(drop=True)
np.searchsorted
provides you with the index positions
where df2
should be inserted in df1
(for the columns
in question):
df2.index = np.searchsorted(df1.ATime.values, df2.loc[:, 'Absolute Time'].values)
Assigning the new index and merging produces a new DataFrame
. Filling the missing Difference
values forward allows to subtract in the next step:
df = pd.merge(df1, df2, left_index=True, right_index=True, how='left').fillna(method='ffill').dropna().astype(int)
df['Absolute Time Adjusted'] = df['Absolute Time'].sub(df.Difference)
print(df.head())
ATime ETime Difference Absolute Time \
1 1444911144979 1715672 1444909429307 1444911018916
1 1444911144979 1715672 1444909429307 1444911138087
2 1444911285683 1856374 1444909429309 1444911138087
3 1444911432742 2003430 1444909429312 1444911303233
3 1444911432742 2003430 1444909429312 1444911359690
Absolute Time Adjusted
1 1589609
1 1708780
2 1708778
3 1873921
3 1930378
Upvotes: 1