Reputation: 21
**I'm new to python language and need help replacing/filling values in df2 based on df1 dataframe inputs:
Below is a description for the both Dataframes**:
<df1>
The table contains employee code and start & End time with entry type:
Start End entry_type
Code
11111 2022-12-12 2023-02-20 M
11114 2023-02-05 2023-02-05 S
11113 2023-02-15 2023-02-15 S
11112 2023-02-18 2023-02-18 S
11114 2023-02-20 2023-02-20 V
11112 2023-02-21 2023-02-21 V
11112 2023-02-26 2023-02-26 V
11111 2023-03-06 2023-03-16 V
11114 2023-03-07 2023-03-12 T
11114 2023-03-10 2023-03-10 T
11111 2023-03-19 2023-03-19 T
df2
***** Constructing and building a df2 DataFrame from df1 **
df1.Start.min()
df1.End.max()
columns = pd.date_range(start=df1.Start.min(), end=df1.End.max(), freq='D')
index = df1.index.unique()
df2= pd.DataFrame(columns=columns, index=index)
print(df2.head())
2022-12-12 2022-12-13 2022-12-14 2022-12-15 2022-12-16 2022-12-17 \
Code
11111 NaN NaN NaN NaN NaN NaN
11114 NaN NaN NaN NaN NaN NaN
11113 NaN NaN NaN NaN NaN NaN
11112 NaN NaN NaN NaN NaN NaN
**The requirement is to fill df2 Dataframe with the designated entrytype based on the following columns value in table df1 (Start, End, entry_type) using a for loop or any alternative method **
not sure how to approach the task correctly.
Upvotes: 2
Views: 57
Reputation: 261900
One option using repeat
with the number of days between start and end to efficiently create all the dates, and pivot_table
to reshape:
df1['Start'] = pd.to_datetime(df1['Start'])
df1['End'] = pd.to_datetime(df1['End'])
tmp = df1.reset_index()
out = (
tmp.loc[tmp.index.repeat(df1['End'].sub(df1['Start']).dt.days.add(1))]
.assign(date=lambda d: d['Start'].add(pd.to_timedelta(d.groupby(level=0).cumcount(), unit='D')))
.pivot_table(index='Code', columns='date', values='entry_type', aggfunc='first')
.reindex(columns=pd.date_range(start=df1['Start'].min(), end=df1['End'].max(), freq='D'))
)
Output:
2022-12-12 2022-12-13 2022-12-14 2022-12-15 2022-12-16 2022-12-17 \
Code
11111 M M M M M M
11112 NaN NaN NaN NaN NaN NaN
11113 NaN NaN NaN NaN NaN NaN
11114 NaN NaN NaN NaN NaN NaN
2022-12-18 2022-12-19 2022-12-20 2022-12-21 ... 2023-03-10 2023-03-11 \
Code ...
11111 M M M M ... V V
11112 NaN NaN NaN NaN ... NaN NaN
11113 NaN NaN NaN NaN ... NaN NaN
11114 NaN NaN NaN NaN ... T T
2023-03-12 2023-03-13 2023-03-14 2023-03-15 2023-03-16 2023-03-17 \
Code
11111 V V V V V NaN
11112 NaN NaN NaN NaN NaN NaN
11113 NaN NaN NaN NaN NaN NaN
11114 T NaN NaN NaN NaN NaN
2023-03-18 2023-03-19
Code
11111 NaN T
11112 NaN NaN
11113 NaN NaN
11114 NaN NaN
[4 rows x 98 columns]
Upvotes: 0
Reputation: 37827
IIUC, you can use pivot_table
:
df2 = (df1.assign(date_rng= df1.apply(lambda x:
pd.date_range(start=x["Start"], end=x["End"]).tolist(), axis=1))
.explode("date_rng").reset_index().pivot_table(
index="code", columns="date_rng", values="entry_type", aggfunc="first")
.rename_axis(index=None)
)
Output :
print(df2)
date_rng 2022-12-12 2022-12-13 2022-12-14 ... 2023-03-15 2023-03-16 2023-03-19
11111 M M M ... V V T
11112 NaN NaN NaN ... NaN NaN NaN
11113 NaN NaN NaN ... NaN NaN NaN
11114 NaN NaN NaN ... NaN NaN NaN
[4 rows x 85 columns]
Input used (df1)
:
df1 = pd.DataFrame(
{'Start': {11111: '2023-03-19',
11114: '2023-03-10',
11113: '2023-02-15',
11112: '2023-02-26'},
'End': {11111: '2023-03-19',
11114: '2023-03-10',
11113: '2023-02-15',
11112: '2023-02-26'},
'entry_type': {11111: 'T', 11114: 'T', 11113: 'S', 11112: 'V'}}
)
Upvotes: 2