Reputation: 129
I have a dataframe with a datetimeindex and a Val column. This value changes daily.
I'd like to create a new column that will be the value that was in Val on the earliest day of the month. Notice I did not say "first" day of month. The earliest day in the index could be the 2nd, 3rd, 10th, etc... Then this value should be propagated until the earliest day in the next month.
In the example below, for 2020-01-02 I'd want 1.764052, and for this to continue until 2020-01-31. Then for 2020-02-03, I'd want 0.864436, and this to continue until the first date that appears in March.
If we can't get that far, even a boolean mask just telling if the index date is the earliest in the month would be enough, and I can sort the value column from that somehow. Thanks in advance.
import numpy as np
import pandas as pd
np.random.seed(0)
rng = pd.date_range('2020-01-02', periods=27, freq='B')
df = pd.DataFrame({ 'Val' : np.random.randn(len(rng)) }, index=rng)
df['Want'] = 99
df
Val Want
2020-01-02 1.764052 1.764052
2020-01-03 0.400157 1.764052
2020-01-06 0.978738 1.764052
2020-01-07 2.240893 1.764052
2020-01-08 1.867558 1.764052
2020-01-09 -0.977278 1.764052
2020-01-10 0.950088 1.764052
2020-01-13 -0.151357 1.764052
2020-01-14 -0.103219 1.764052
2020-01-15 0.410599 1.764052
2020-01-16 0.144044 1.764052
2020-01-17 1.454274 1.764052
2020-01-20 0.761038 1.764052
2020-01-21 0.121675 1.764052
2020-01-22 0.443863 1.764052
2020-01-23 0.333674 1.764052
2020-01-24 1.494079 1.764052
2020-01-27 -0.205158 1.764052
2020-01-28 0.313068 1.764052
2020-01-29 -0.854096 1.764052
2020-01-30 -2.552990 1.764052
2020-01-31 0.653619 1.764052
2020-02-03 0.864436 0.864436
2020-02-04 -0.742165 0.864436
2020-02-05 2.269755 0.864436
2020-02-06 -1.454366 0.864436
2020-02-07 0.045759 0.864436
Upvotes: 0
Views: 132
Reputation: 31166
groupby()
year and month in indextransform()
to return for all rowslambda
function to pick first value in grouped seriesimport numpy as np
np.random.seed(0)
rng = pd.date_range('2020-01-02', periods=27, freq='B')
df = pd.DataFrame({ 'Val' : np.random.randn(len(rng)) }, index=rng)
df['Want'] = 99
df["Want"] = df.groupby([df.index.year,df.index.month])["Val"].transform(lambda s: s[0])
df
Val | Want | |
---|---|---|
2020-01-02 00:00:00 | 1.76405 | 1.76405 |
2020-01-03 00:00:00 | 0.400157 | 1.76405 |
2020-01-06 00:00:00 | 0.978738 | 1.76405 |
2020-01-07 00:00:00 | 2.24089 | 1.76405 |
2020-01-08 00:00:00 | 1.86756 | 1.76405 |
2020-01-09 00:00:00 | -0.977278 | 1.76405 |
2020-01-10 00:00:00 | 0.950088 | 1.76405 |
2020-01-13 00:00:00 | -0.151357 | 1.76405 |
2020-01-14 00:00:00 | -0.103219 | 1.76405 |
2020-01-15 00:00:00 | 0.410599 | 1.76405 |
2020-01-16 00:00:00 | 0.144044 | 1.76405 |
2020-01-17 00:00:00 | 1.45427 | 1.76405 |
2020-01-20 00:00:00 | 0.761038 | 1.76405 |
2020-01-21 00:00:00 | 0.121675 | 1.76405 |
2020-01-22 00:00:00 | 0.443863 | 1.76405 |
2020-01-23 00:00:00 | 0.333674 | 1.76405 |
2020-01-24 00:00:00 | 1.49408 | 1.76405 |
2020-01-27 00:00:00 | -0.205158 | 1.76405 |
2020-01-28 00:00:00 | 0.313068 | 1.76405 |
2020-01-29 00:00:00 | -0.854096 | 1.76405 |
2020-01-30 00:00:00 | -2.55299 | 1.76405 |
2020-01-31 00:00:00 | 0.653619 | 1.76405 |
2020-02-03 00:00:00 | 0.864436 | 0.864436 |
2020-02-04 00:00:00 | -0.742165 | 0.864436 |
2020-02-05 00:00:00 | 2.26975 | 0.864436 |
2020-02-06 00:00:00 | -1.45437 | 0.864436 |
2020-02-07 00:00:00 | 0.0457585 | 0.864436 |
Upvotes: 1