Reputation: 21

How to replace values between to dataframes using for loops and date range

**I'm new to python language and need help replacing/filling values in df2 based on df1 dataframe inputs:

Below is a description for the both Dataframes**:

<df1>

The table contains employee code and start & End time with entry type:

           Start        End entry_type
Code                                  
11111 2022-12-12 2023-02-20          M
11114 2023-02-05 2023-02-05          S
11113 2023-02-15 2023-02-15          S
11112 2023-02-18 2023-02-18          S
11114 2023-02-20 2023-02-20          V
11112 2023-02-21 2023-02-21          V
11112 2023-02-26 2023-02-26          V
11111 2023-03-06 2023-03-16          V
11114 2023-03-07 2023-03-12          T
11114 2023-03-10 2023-03-10          T
11111 2023-03-19 2023-03-19          T

df2

***** Constructing and building a df2 DataFrame from df1 **

df1.Start.min()

df1.End.max()

columns = pd.date_range(start=df1.Start.min(), end=df1.End.max(), freq='D')

index = df1.index.unique()

df2= pd.DataFrame(columns=columns, index=index)

print(df2.head())

    2022-12-12 2022-12-13 2022-12-14 2022-12-15 2022-12-16 2022-12-17  \
Code                                                                      
11111        NaN        NaN        NaN        NaN        NaN        NaN   
11114        NaN        NaN        NaN        NaN        NaN        NaN   
11113        NaN        NaN        NaN        NaN        NaN        NaN   
11112        NaN        NaN        NaN        NaN        NaN        NaN

**The requirement is to fill df2 Dataframe with the designated entrytype based on the following columns value in table df1 (Start, End, entry_type) using a for loop or any alternative method **

not sure how to approach the task correctly.

Upvotes: 2

Answers (2)

mozway

Reputation: 261900

One option using repeat with the number of days between start and end to efficiently create all the dates, and pivot_table to reshape:

df1['Start'] = pd.to_datetime(df1['Start'])
df1['End'] = pd.to_datetime(df1['End'])

tmp = df1.reset_index()

out = (
 tmp.loc[tmp.index.repeat(df1['End'].sub(df1['Start']).dt.days.add(1))]
    .assign(date=lambda d: d['Start'].add(pd.to_timedelta(d.groupby(level=0).cumcount(), unit='D')))
    .pivot_table(index='Code', columns='date', values='entry_type', aggfunc='first')
    .reindex(columns=pd.date_range(start=df1['Start'].min(), end=df1['End'].max(), freq='D'))
)

Output:

      2022-12-12 2022-12-13 2022-12-14 2022-12-15 2022-12-16 2022-12-17  \
Code                                                                      
11111          M          M          M          M          M          M   
11112        NaN        NaN        NaN        NaN        NaN        NaN   
11113        NaN        NaN        NaN        NaN        NaN        NaN   
11114        NaN        NaN        NaN        NaN        NaN        NaN   

      2022-12-18 2022-12-19 2022-12-20 2022-12-21  ... 2023-03-10 2023-03-11  \
Code                                               ...                         
11111          M          M          M          M  ...          V          V   
11112        NaN        NaN        NaN        NaN  ...        NaN        NaN   
11113        NaN        NaN        NaN        NaN  ...        NaN        NaN   
11114        NaN        NaN        NaN        NaN  ...          T          T   

      2023-03-12 2023-03-13 2023-03-14 2023-03-15 2023-03-16 2023-03-17  \
Code                                                                      
11111          V          V          V          V          V        NaN   
11112        NaN        NaN        NaN        NaN        NaN        NaN   
11113        NaN        NaN        NaN        NaN        NaN        NaN   
11114          T        NaN        NaN        NaN        NaN        NaN   

      2023-03-18 2023-03-19  
Code                         
11111        NaN          T  
11112        NaN        NaN  
11113        NaN        NaN  
11114        NaN        NaN  

[4 rows x 98 columns]

Upvotes: 0

Timeless

Reputation: 37827

IIUC, you can use pivot_table :

df2 = (df1.assign(date_rng= df1.apply(lambda x: 
             pd.date_range(start=x["Start"], end=x["End"]).tolist(), axis=1))
          .explode("date_rng").reset_index().pivot_table(
             index="code", columns="date_rng", values="entry_type", aggfunc="first")
          .rename_axis(index=None)
)

Output :

print(df2)

date_rng 2022-12-12 2022-12-13 2022-12-14  ... 2023-03-15 2023-03-16 2023-03-19
11111             M          M          M  ...          V          V          T
11112           NaN        NaN        NaN  ...        NaN        NaN        NaN
11113           NaN        NaN        NaN  ...        NaN        NaN        NaN
11114           NaN        NaN        NaN  ...        NaN        NaN        NaN

[4 rows x 85 columns]

Input used (df1) :

df1 = pd.DataFrame(
    {'Start': {11111: '2023-03-19',
      11114: '2023-03-10',
      11113: '2023-02-15',
      11112: '2023-02-26'},
      'End': {11111: '2023-03-19',
      11114: '2023-03-10',
      11113: '2023-02-15',
      11112: '2023-02-26'},
     'entry_type': {11111: 'T', 11114: 'T', 11113: 'S', 11112: 'V'}}
)

Upvotes: 2

How to replace values between to dataframes using for loops and date range

Answers (2)

Related Questions