RaceFrog
RaceFrog

Reputation: 129

Copying value based on earliest day of month of datetimeindex

I have a dataframe with a datetimeindex and a Val column. This value changes daily.

I'd like to create a new column that will be the value that was in Val on the earliest day of the month. Notice I did not say "first" day of month. The earliest day in the index could be the 2nd, 3rd, 10th, etc... Then this value should be propagated until the earliest day in the next month.

In the example below, for 2020-01-02 I'd want 1.764052, and for this to continue until 2020-01-31. Then for 2020-02-03, I'd want 0.864436, and this to continue until the first date that appears in March.

If we can't get that far, even a boolean mask just telling if the index date is the earliest in the month would be enough, and I can sort the value column from that somehow. Thanks in advance.

import numpy as np
import pandas as pd

np.random.seed(0)
rng = pd.date_range('2020-01-02', periods=27, freq='B')
df = pd.DataFrame({ 'Val' : np.random.randn(len(rng)) }, index=rng)
df['Want'] = 99
df

                Val  Want
2020-01-02  1.764052    1.764052    
2020-01-03  0.400157    1.764052    
2020-01-06  0.978738    1.764052    
2020-01-07  2.240893    1.764052    
2020-01-08  1.867558    1.764052    
2020-01-09 -0.977278    1.764052    
2020-01-10  0.950088    1.764052    
2020-01-13 -0.151357    1.764052    
2020-01-14 -0.103219    1.764052    
2020-01-15  0.410599    1.764052    
2020-01-16  0.144044    1.764052    
2020-01-17  1.454274    1.764052    
2020-01-20  0.761038    1.764052    
2020-01-21  0.121675    1.764052    
2020-01-22  0.443863    1.764052    
2020-01-23  0.333674    1.764052    
2020-01-24  1.494079    1.764052    
2020-01-27 -0.205158    1.764052    
2020-01-28  0.313068    1.764052    
2020-01-29 -0.854096    1.764052    
2020-01-30 -2.552990    1.764052    
2020-01-31  0.653619    1.764052    
2020-02-03  0.864436    0.864436    
2020-02-04 -0.742165    0.864436    
2020-02-05  2.269755    0.864436    
2020-02-06 -1.454366    0.864436    
2020-02-07  0.045759    0.864436    
​

Upvotes: 0

Views: 132

Answers (1)

Rob Raymond
Rob Raymond

Reputation: 31166

  • groupby() year and month in index
  • transform() to return for all rows
  • lambda function to pick first value in grouped series
  • can simplify further to "first"
import numpy as np
np.random.seed(0)
rng = pd.date_range('2020-01-02', periods=27, freq='B')
df = pd.DataFrame({ 'Val' : np.random.randn(len(rng)) }, index=rng)
df['Want'] = 99
df["Want"] = df.groupby([df.index.year,df.index.month])["Val"].transform(lambda s: s[0])
df
Val Want
2020-01-02 00:00:00 1.76405 1.76405
2020-01-03 00:00:00 0.400157 1.76405
2020-01-06 00:00:00 0.978738 1.76405
2020-01-07 00:00:00 2.24089 1.76405
2020-01-08 00:00:00 1.86756 1.76405
2020-01-09 00:00:00 -0.977278 1.76405
2020-01-10 00:00:00 0.950088 1.76405
2020-01-13 00:00:00 -0.151357 1.76405
2020-01-14 00:00:00 -0.103219 1.76405
2020-01-15 00:00:00 0.410599 1.76405
2020-01-16 00:00:00 0.144044 1.76405
2020-01-17 00:00:00 1.45427 1.76405
2020-01-20 00:00:00 0.761038 1.76405
2020-01-21 00:00:00 0.121675 1.76405
2020-01-22 00:00:00 0.443863 1.76405
2020-01-23 00:00:00 0.333674 1.76405
2020-01-24 00:00:00 1.49408 1.76405
2020-01-27 00:00:00 -0.205158 1.76405
2020-01-28 00:00:00 0.313068 1.76405
2020-01-29 00:00:00 -0.854096 1.76405
2020-01-30 00:00:00 -2.55299 1.76405
2020-01-31 00:00:00 0.653619 1.76405
2020-02-03 00:00:00 0.864436 0.864436
2020-02-04 00:00:00 -0.742165 0.864436
2020-02-05 00:00:00 2.26975 0.864436
2020-02-06 00:00:00 -1.45437 0.864436
2020-02-07 00:00:00 0.0457585 0.864436

Upvotes: 1

Related Questions