Reputation: 112
I want to merge two dataframes to create a single time-series with two variables.
I have a function that does this by iterating over each dataframe using itterows()... which is terribly slow and doesn't take advantage of the vectorization that pandas and numpy provide...
Would you be able to help?
This code illustrates what I am trying to do:
a = pd.DataFrame(data={'timestamp':[1,2,5,6,10],'x':[2,6,3,4,2]})
b = pd.DataFrame(data={'timestamp':[2,3,4,10],'y':[3,1,2,1]})
#z = Magical line of code/function call here
#z output: {'timestamp':[1,2,3,4,5,6,10],'x':[2,6,6,6,3,4,2], 'y':[NaN,3,1,2,2,2,1] }
Upvotes: 1
Views: 492
Reputation: 93161
This can be broken down into 2 steps:
merge(..., how="outer")
ffill
z = a.merge(b, on="timestamp", how="outer").sort_values("timestamp").ffill()
Upvotes: 1