Reputation: 123

Join two Pandas Series with different DateTimeIndex

I have two pandas series with DateTimeIndex. I'd like to join these two series such that the resulting DataFrame uses the index of the first series and "matches" the values from the second series accordingly (using a linear interpolation in the second series).

First Series:

2020-03-01    1
2020-03-03    2
2020-03-05    3
2020-03-07    4

Second Series:

2020-03-01    20
2020-03-02    22
2020-03-05    25
2020-03-06    35
2020-03-07    36
2020-03-08    45

Desired Output:

2020-03-01    1    20
2020-03-03    2    23
2020-03-05    3    25
2020-03-07    4    36

Code for generating the input data:

import pandas as pd
import datetime as dt

s1 = pd.Series([1, 2, 3, 4])
s1.index = pd.to_datetime([dt.date(2020, 3, 1), dt.date(2020, 3, 3), dt.date(2020, 3, 5), dt.date(2020, 3, 7)])

s2 = pd.Series([20, 22, 25, 35, 36, 45])
s2.index = pd.to_datetime([dt.date(2020, 3, 1), dt.date(2020, 3, 2), dt.date(2020, 3, 5), dt.date(2020, 3, 6), dt.date(2020, 3, 7), dt.date(2020, 3, 8)])

Upvotes: 1

Answers (3)

Kodsama

Reputation: 141

Thanks @codeape for the example, even if not doing what I needed I changed it to fix the problem. Here is my solution:

# Create a dataframe with two columns
df = pd.DataFrame({'room1': room1, 'room2': room2})
print(df.shape, df)

# Fill the NaN values with the previous from each column,
# then, add 0 to the non filled one (ones there is no previous value for)
# Finish by converting back to integers
df = df.fillna(method='ffill').fillna(0).astype('int64')
print(df.shape, df)

# To get the final result, add the two columns
result = df['room1'] + df['room2']

Upvotes: 0

jezrael

Reputation: 862471

Use concat with inner join:

df = pd.concat([s1, s2], axis=1, keys=('s1','s2'), join='inner')
print (df)
            s1  s2
2020-03-01   1  20
2020-03-05   3  25
2020-03-07   4  36

Solution with interpolate of s2 Series and then removed rows with missing values:

df = (pd.concat([s1, s2], axis=1, keys=('s1','s2'))
        .assign(s2 = lambda x: x.s2.interpolate('index'))
        .dropna())
print (df)
             s1    s2
2020-03-01  1.0  20.0
2020-03-03  2.0  23.0
2020-03-05  3.0  25.0
2020-03-07  4.0  36.0

Upvotes: 5

codeape

Reputation: 100766

Construct combined dataframe

# there are many ways to construct a dataframe from series, this uses the constructor:
df = pd.DataFrame({'s1': s1, 's2': s2})
             s1    s2
2020-03-01  1.0  20.0
2020-03-02  NaN  22.0
2020-03-03  2.0   NaN
2020-03-05  3.0  25.0
2020-03-06  NaN  35.0
2020-03-07  4.0  36.0
2020-03-08  NaN  45.0

Interpolate

df = df.interpolate()
             s1    s2
2020-03-01  1.0  20.0
2020-03-02  1.5  22.0
2020-03-03  2.0  23.5
2020-03-05  3.0  25.0
2020-03-06  3.5  35.0
2020-03-07  4.0  36.0
2020-03-08  4.0  45.0

Restrict rows

# Only keep the rows that were in s1's index. 
# Several ways to do this, but this example uses .filter
df = df.filter(s1.index, axis=0)
             s1    s2
2020-03-01  1.0  20.0
2020-03-03  2.0  23.5
2020-03-05  3.0  25.0
2020-03-07  4.0  36.0

Convert numbers back to int64

df = df.astype('int64')
        s1  s2
2020-03-01   1  20
2020-03-03   2  23
2020-03-05   3  25
2020-03-07   4  36

One-liner:

df = pd.DataFrame({'s1': s1, 's2': s2}).interpolate().filter(s1.index, axis=0).astype('int64')

Documentation links: