fjurt
fjurt

Reputation: 793

Calculating the difference column wise of two time series dataframe with different dimensions

I have two DataFrames df1 (mxn) and df2 (mx1) as time series and I want to calculate the difference for each column between df1 and df2 which would look like df3.

import pandas as pd
df1 = pd.DataFrame({
    'Date':['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04'],
    '01K W':[1.2, 0.4, 0.2, -0.4], 
    '02K W':[3.5, 3.2, 'nan', 'nan'], 
    '03K W':[-1, -2.3, 0.3, 2.4], 
    '04K W':[1.5, 2.6, 3.2, 4.2]})

df2 = pd.DataFrame({
    'Date':['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04'],
    'K W':[1, 1.5, 1.2, 0.8]})

df3 = pd.DataFrame({
    'Date':['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04'],
    '01K W':[0.2, 1.1, 1, 1.2], 
    '02K W':[2.5, 1.7, 'nan', 'nan'], 
    '03K W':[2, 3.8, 0.9, 1.6], 
    '04K W':[0.5, 1.1, 2, 3.4]})

Is there an easy way to build a difference column wise?

Upvotes: 0

Views: 249

Answers (3)

Wilian
Wilian

Reputation: 1257

Another way to do:

df4 = df1[['01K W', '02K W', '03K W', '04K W']].astype(float).subtract(df2['K W'].astype(float), axis=0).abs().join(df1['Date'])[['Date','01K W', '02K W', '03K W', '04K W']]


print(df4)

Upvotes: 1

daddydan
daddydan

Reputation: 127

First you will need to use numeric values, not strings.

It looks like your 'Date' field represents your index. Pandas series can be added/subtracted element-wise based on their shared index so worth calling those out as an index. Then you can simply iterate through your df1 columns to apply df2 to each.

from numpy import nan
import pandas as pd

df1 = pd.DataFrame({
    'Date':['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04'],
    '01K W':[1.2, 0.4, 0.2, -0.4], 
    '02K W':[3.5, 3.2, nan, nan], 
    '03K W':[-1, -2.3, 0.3, 2.4], 
    '04K W':[1.5, 2.6, 3.2, 4.2]})

df2 = pd.DataFrame({
    'Date':['2021-01-01', '2021-01-02', '2021-01-03', '2021-01-04'],
    'K W':[1, 1.5, 1.2, 0.8]})

df1 = df1.set_index('Date')
df2 = df2.set_index('Date')

df3 = df1.copy()

for c in df1.columns:
    df3[c] = df1[c] - df2['K W']
    
df3

Yields:

            01K W  02K W  03K W  04K W
Date                                  
2021-01-01    0.2    2.5   -2.0    0.5
2021-01-02   -1.1    1.7   -3.8    1.1
2021-01-03   -1.0    NaN   -0.9    2.0
2021-01-04   -1.2    NaN    1.6    3.4

Upvotes: 2

Quang Hoang
Quang Hoang

Reputation: 150785

You can set Date as index, and use .sub method:

df1.set_index('Date').sub(df2.set_index('Date')['K W'], axis='rows')

Output:

            01K W  02K W  03K W  04K W
Date                                  
2021-01-01    0.2    2.5   -2.0    0.5
2021-01-02   -1.1    1.7   -3.8    1.1
2021-01-03   -1.0    NaN   -0.9    2.0
2021-01-04   -1.2    NaN    1.6    3.4

Note: you might want to add astype(float) after set_index('Date') to correct your data type.

Upvotes: 5

Related Questions