Reputation: 4842
I have a df
df = pd.DataFrame(index = ['A','B','C','D','E'],
columns = ['date_1','date_2','value_2','value_3','value_4'],
data = [['2021-06-28', '2022-05-03', 30, 40, 60],
['2022-01-10', '2022-05-15', 50, 90, 70],
[np.nan, '2022-05-15', 40, 60, 80],
[np.nan, '2022-04-28', 40, 60, 90],
[np.nan, '2022-06-28', 50, 60, 54]])
date_1 date_2 value_2 value_3 value_4
A 2021-06-28 2022-05-03 30 40 60
B 2022-01-03 2022-05-15 50 90 70
C NaN 2022-05-15 40 60 80
D NaN 2022-04-28 40 60 90
...
E NaN 2022-06-28 50 60 54
I am trying to fill NaN
values in column date_1
. The values I need to fill the date_1
column are changing every week, the min
value of date_1
value need to be2021-06-28
and the max
value is 2022-06-20
. Each week the max
value in date_1
column will be the last Monday. I need column date_1
to each date up to 2022-06-20
at least once so that each date starting from 2021-06-28
to 2022-06-20
will be in date_1
at least once. The order of these values does not matter.
I tried:
from datetime import date, timedelta
today = date.today()
last_monday = pd.to_datetime((today - timedelta(days=today.weekday()) - timedelta(days=7)).strftime('%Y-%m-%d'))
# date_mappings is a dictionary with this kind of structure:
# {1 : '2021-06-28', 2 : '2021-07-05', ... 52 : '2022-06-20'}
dates_needed = [x for x in pd.to_datetime(list(date_mappings.values())) if x >= last_monday]
So now dates_needed
has the remaining of the dates that needs to be added at least once in date_1
column.
The problem I am facing is that the shapes do not match when I try to fill the values, because there can be multiple rows with the same date_2
.
If I try to use:
df.loc[df['date_1'].isna(), 'date_1'] = dates_needed
I get:
ValueError: Must have equal len keys and value when setting with an iterable
Because this only works if I match the shape:
df.loc[df['date_1'].isna(), 'date_1'] = [pd.to_datetime('2022-01-10 00:00:00'),
pd.to_datetime('2022-01-17 00:00:00'),
pd.to_datetime('2022-01-24 00:00:00')]
date_1 date_2 value_2 value_3 value_4
A 2021-06-28 2022-05-03 30 40 60
B 2022-01-10 2022-05-15 50 90 70
C 2022-01-10 2022-05-15 40 60 80
D 2022-01-17 2022-04-28 40 60 90
E 2022-01-24 2022-06-28 50 60 54
So my goal is to fill NaN
values in date_1
from a created list dates_needed
where the each date from dates_needed
is used at least once in date_1
column and the order does not matter.
Upvotes: 0
Views: 106
Reputation: 863166
Here is solution for mapping by integers from date_mappings
by helper Index
by number of missing values by sum
. Solution working if difference between length of dict vs number of missing values:
m = df['date_1'].isna()
df.loc[m, 'date_1'] = (pd.Index(range(m.sum())) + 1).map(date_mappings)
Upvotes: 1