Jonas Palačionis
Jonas Palačionis

Reputation: 4842

Filling NaN values from a list with different shape

I have a df

df = pd.DataFrame(index = ['A','B','C','D','E'],
             columns = ['date_1','date_2','value_2','value_3','value_4'], 
             data = [['2021-06-28', '2022-05-03', 30, 40, 60],
                     ['2022-01-10', '2022-05-15', 50, 90, 70],
                     [np.nan, '2022-05-15', 40, 60, 80],
                     [np.nan,  '2022-04-28', 40, 60, 90],
                     [np.nan, '2022-06-28', 50, 60, 54]])              
    date_1              date_2          value_2     value_3     value_4
A   2021-06-28          2022-05-03      30          40          60
B   2022-01-03          2022-05-15      50          90          70
C   NaN                 2022-05-15      40          60          80
D   NaN                 2022-04-28      40          60          90
...
E   NaN                 2022-06-28      50          60          54

I am trying to fill NaN values in column date_1. The values I need to fill the date_1 column are changing every week, the min value of date_1 value need to be2021-06-28 and the max value is 2022-06-20. Each week the max value in date_1 column will be the last Monday. I need column date_1 to each date up to 2022-06-20 at least once so that each date starting from 2021-06-28 to 2022-06-20 will be in date_1 at least once. The order of these values does not matter.

I tried:

from datetime import date, timedelta

today = date.today()
last_monday = pd.to_datetime((today - timedelta(days=today.weekday())  - timedelta(days=7)).strftime('%Y-%m-%d'))

# date_mappings is a dictionary with this kind of structure:
# {1 : '2021-06-28', 2 : '2021-07-05', ... 52 : '2022-06-20'}

dates_needed = [x for x in pd.to_datetime(list(date_mappings.values())) if x >= last_monday]

So now dates_needed has the remaining of the dates that needs to be added at least once in date_1 column.

The problem I am facing is that the shapes do not match when I try to fill the values, because there can be multiple rows with the same date_2.

If I try to use:

df.loc[df['date_1'].isna(), 'date_1'] = dates_needed

I get:

ValueError: Must have equal len keys and value when setting with an iterable

Because this only works if I match the shape:

df.loc[df['date_1'].isna(), 'date_1'] = [pd.to_datetime('2022-01-10 00:00:00'),
                                        pd.to_datetime('2022-01-17 00:00:00'),
                                        pd.to_datetime('2022-01-24 00:00:00')]
    date_1         date_2           value_2     value_3     value_4
A   2021-06-28     2022-05-03       30          40          60
B   2022-01-10     2022-05-15       50          90          70
C   2022-01-10     2022-05-15       40          60          80
D   2022-01-17     2022-04-28       40          60          90
E   2022-01-24     2022-06-28       50          60          54

So my goal is to fill NaN values in date_1 from a created list dates_needed where the each date from dates_needed is used at least once in date_1 column and the order does not matter.

Upvotes: 0

Views: 106

Answers (1)

jezrael
jezrael

Reputation: 863166

Here is solution for mapping by integers from date_mappings by helper Index by number of missing values by sum. Solution working if difference between length of dict vs number of missing values:

m = df['date_1'].isna()
df.loc[m, 'date_1'] = (pd.Index(range(m.sum())) + 1).map(date_mappings)

Upvotes: 1

Related Questions