Reputation: 123
I have two pandas series with DateTimeIndex. I'd like to join these two series such that the resulting DataFrame uses the index of the first series and "matches" the values from the second series accordingly (using a linear interpolation in the second series).
First Series:
2020-03-01 1
2020-03-03 2
2020-03-05 3
2020-03-07 4
Second Series:
2020-03-01 20
2020-03-02 22
2020-03-05 25
2020-03-06 35
2020-03-07 36
2020-03-08 45
Desired Output:
2020-03-01 1 20
2020-03-03 2 23
2020-03-05 3 25
2020-03-07 4 36
Code for generating the input data:
import pandas as pd
import datetime as dt
s1 = pd.Series([1, 2, 3, 4])
s1.index = pd.to_datetime([dt.date(2020, 3, 1), dt.date(2020, 3, 3), dt.date(2020, 3, 5), dt.date(2020, 3, 7)])
s2 = pd.Series([20, 22, 25, 35, 36, 45])
s2.index = pd.to_datetime([dt.date(2020, 3, 1), dt.date(2020, 3, 2), dt.date(2020, 3, 5), dt.date(2020, 3, 6), dt.date(2020, 3, 7), dt.date(2020, 3, 8)])
Upvotes: 1
Views: 933
Reputation: 141
Thanks @codeape for the example, even if not doing what I needed I changed it to fix the problem. Here is my solution:
# Create a dataframe with two columns
df = pd.DataFrame({'room1': room1, 'room2': room2})
print(df.shape, df)
# Fill the NaN values with the previous from each column,
# then, add 0 to the non filled one (ones there is no previous value for)
# Finish by converting back to integers
df = df.fillna(method='ffill').fillna(0).astype('int64')
print(df.shape, df)
# To get the final result, add the two columns
result = df['room1'] + df['room2']
Upvotes: 0
Reputation: 862471
Use concat
with inner join:
df = pd.concat([s1, s2], axis=1, keys=('s1','s2'), join='inner')
print (df)
s1 s2
2020-03-01 1 20
2020-03-05 3 25
2020-03-07 4 36
Solution with interpolate of s2
Series and then removed rows with missing values:
df = (pd.concat([s1, s2], axis=1, keys=('s1','s2'))
.assign(s2 = lambda x: x.s2.interpolate('index'))
.dropna())
print (df)
s1 s2
2020-03-01 1.0 20.0
2020-03-03 2.0 23.0
2020-03-05 3.0 25.0
2020-03-07 4.0 36.0
Upvotes: 5
Reputation: 100766
# there are many ways to construct a dataframe from series, this uses the constructor:
df = pd.DataFrame({'s1': s1, 's2': s2})
s1 s2
2020-03-01 1.0 20.0
2020-03-02 NaN 22.0
2020-03-03 2.0 NaN
2020-03-05 3.0 25.0
2020-03-06 NaN 35.0
2020-03-07 4.0 36.0
2020-03-08 NaN 45.0
df = df.interpolate()
s1 s2
2020-03-01 1.0 20.0
2020-03-02 1.5 22.0
2020-03-03 2.0 23.5
2020-03-05 3.0 25.0
2020-03-06 3.5 35.0
2020-03-07 4.0 36.0
2020-03-08 4.0 45.0
# Only keep the rows that were in s1's index.
# Several ways to do this, but this example uses .filter
df = df.filter(s1.index, axis=0)
s1 s2
2020-03-01 1.0 20.0
2020-03-03 2.0 23.5
2020-03-05 3.0 25.0
2020-03-07 4.0 36.0
df = df.astype('int64')
s1 s2
2020-03-01 1 20
2020-03-03 2 23
2020-03-05 3 25
2020-03-07 4 36
One-liner:
df = pd.DataFrame({'s1': s1, 's2': s2}).interpolate().filter(s1.index, axis=0).astype('int64')
Documentation links:
Upvotes: 0