Copying value based on earliest day of month of datetimeindex

Question

I have a dataframe with a datetimeindex and a Val column. This value changes daily.

I'd like to create a new column that will be the value that was in Val on the earliest day of the month. Notice I did not say "first" day of month. The earliest day in the index could be the 2nd, 3rd, 10th, etc... Then this value should be propagated until the earliest day in the next month.

In the example below, for 2020-01-02 I'd want 1.764052, and for this to continue until 2020-01-31. Then for 2020-02-03, I'd want 0.864436, and this to continue until the first date that appears in March.

If we can't get that far, even a boolean mask just telling if the index date is the earliest in the month would be enough, and I can sort the value column from that somehow. Thanks in advance.

import numpy as np
import pandas as pd

np.random.seed(0)
rng = pd.date_range('2020-01-02', periods=27, freq='B')
df = pd.DataFrame({ 'Val' : np.random.randn(len(rng)) }, index=rng)
df['Want'] = 99
df

                Val  Want
2020-01-02  1.764052    1.764052    
2020-01-03  0.400157    1.764052    
2020-01-06  0.978738    1.764052    
2020-01-07  2.240893    1.764052    
2020-01-08  1.867558    1.764052    
2020-01-09 -0.977278    1.764052    
2020-01-10  0.950088    1.764052    
2020-01-13 -0.151357    1.764052    
2020-01-14 -0.103219    1.764052    
2020-01-15  0.410599    1.764052    
2020-01-16  0.144044    1.764052    
2020-01-17  1.454274    1.764052    
2020-01-20  0.761038    1.764052    
2020-01-21  0.121675    1.764052    
2020-01-22  0.443863    1.764052    
2020-01-23  0.333674    1.764052    
2020-01-24  1.494079    1.764052    
2020-01-27 -0.205158    1.764052    
2020-01-28  0.313068    1.764052    
2020-01-29 -0.854096    1.764052    
2020-01-30 -2.552990    1.764052    
2020-01-31  0.653619    1.764052    
2020-02-03  0.864436    0.864436    
2020-02-04 -0.742165    0.864436    
2020-02-05  2.269755    0.864436    
2020-02-06 -1.454366    0.864436    
2020-02-07  0.045759    0.864436

Rob Raymond · Accepted Answer

groupby() year and month in index
transform() to return for all rows
lambda function to pick first value in grouped series
can simplify further to "first"

import numpy as np
np.random.seed(0)
rng = pd.date_range('2020-01-02', periods=27, freq='B')
df = pd.DataFrame({ 'Val' : np.random.randn(len(rng)) }, index=rng)
df['Want'] = 99
df["Want"] = df.groupby([df.index.year,df.index.month])["Val"].transform(lambda s: s[0])
df

	Val	Want
2020-01-02 00:00:00	1.76405	1.76405
2020-01-03 00:00:00	0.400157	1.76405
2020-01-06 00:00:00	0.978738	1.76405
2020-01-07 00:00:00	2.24089	1.76405
2020-01-08 00:00:00	1.86756	1.76405
2020-01-09 00:00:00	-0.977278	1.76405
2020-01-10 00:00:00	0.950088	1.76405
2020-01-13 00:00:00	-0.151357	1.76405
2020-01-14 00:00:00	-0.103219	1.76405
2020-01-15 00:00:00	0.410599	1.76405
2020-01-16 00:00:00	0.144044	1.76405
2020-01-17 00:00:00	1.45427	1.76405
2020-01-20 00:00:00	0.761038	1.76405
2020-01-21 00:00:00	0.121675	1.76405
2020-01-22 00:00:00	0.443863	1.76405
2020-01-23 00:00:00	0.333674	1.76405
2020-01-24 00:00:00	1.49408	1.76405
2020-01-27 00:00:00	-0.205158	1.76405
2020-01-28 00:00:00	0.313068	1.76405
2020-01-29 00:00:00	-0.854096	1.76405
2020-01-30 00:00:00	-2.55299	1.76405
2020-01-31 00:00:00	0.653619	1.76405
2020-02-03 00:00:00	0.864436	0.864436
2020-02-04 00:00:00	-0.742165	0.864436
2020-02-05 00:00:00	2.26975	0.864436
2020-02-06 00:00:00	-1.45437	0.864436
2020-02-07 00:00:00	0.0457585	0.864436

Copying value based on earliest day of month of datetimeindex

Answers (1)

Related Questions