Pandas: combine columns with different time frequencies

Question

I have two dataframes: df_p and df_d.

df_p contains 8760 entries, it represents 1 year of records with 1 hour resolution. date is a datetime column, hy is the number of the hour (of the year), profile is a value

                       hy   profile
date                               
1900-01-01 00:00:00     0  0.030908
1900-01-01 01:00:00     1  0.030053
1900-01-01 02:00:00     2  0.043185
1900-01-01 03:00:00     3  0.048040
1900-01-01 04:00:00     4  0.070803
1900-01-01 05:00:00     5  0.070948
...                  ...       ...
1900-12-31 19:00:00  8755  0.058093
1900-12-31 20:00:00  8756  0.030908
1900-12-31 21:00:00  8757  0.030053
1900-12-31 22:00:00  8758  0.043185
1900-12-31 23:00:00  8759  0.048040

df_d contains 365 entries, it represents 1 year of records with 1 day resolution. date is a datetime column, HDD is a value that is valid for the whole day.

date           HDD
1900-01-01     9.823750
1900-01-02     7.747917
1900-01-03     5.198750
1900-01-04     7.547500
1900-01-05     9.755000
               ...
1900-12-25     8.549129
1900-12-26     8.245625
1900-12-27     7.641417
1900-12-28     6.176417
1900-12-29     6.368917
1900-12-30     6.443333
1900-12-31     7.224583

What I want to do is to create a new column in the df_p dataframe which, for each hour of the day, has the respective value of HDD.

The following is an example of the desired output:

                       hy   profile  HDD
date                               
1900-01-01 00:00:00     0  0.030908  9.823750
1900-01-01 01:00:00     1  0.030053  9.823750
1900-01-01 02:00:00     2  0.043185  9.823750
1900-01-01 03:00:00     3  0.048040  9.823750
1900-01-01 04:00:00     4  0.070803  9.823750
1900-01-01 05:00:00     5  0.070948  9.823750
...                  ...       ...
1900-12-31 19:00:00  8755  0.058093  7.224583
1900-12-31 20:00:00  8756  0.030908  7.224583
1900-12-31 21:00:00  8757  0.030053  7.224583
1900-12-31 22:00:00  8758  0.043185  7.224583
1900-12-31 23:00:00  8759  0.048040  7.224583

Thank you for your help.

jezrael · Accepted Answer

Use merge_asof:

#if necessary convert to datetimes
df_d['date'] = pd.to_datetime(df_d['date'])
df_p.index = pd.to_datetime(df_p.index)

df = pd.merge_asof(df_p, df_d, left_index=True, right_on='date')
print (df)
      hy   profile       date       HDH
0      0  0.030908 1900-01-01  9.823750
0      1  0.030053 1900-01-01  9.823750
0      2  0.043185 1900-01-01  9.823750
0      3  0.048040 1900-01-01  9.823750
0      4  0.070803 1900-01-01  9.823750
0      5  0.070948 1900-01-01  9.823750
11  8755  0.058093 1900-12-31  7.224583
11  8756  0.030908 1900-12-31  7.224583
11  8757  0.030053 1900-12-31  7.224583
11  8758  0.043185 1900-12-31  7.224583
11  8759  0.048040 1900-12-31  7.224583

Last for remove column date use DataFrame.drop:

df = df.drop('date', axis=1)

Pandas: combine columns with different time frequencies

Answers (2)

Extra info per comments below:

Related Questions