Reputation: 477
I have a dataframe with name values and a date range (start/end). I need to expand/replace the dates with the ones generated by the from/to index. How can I do this?
Name date_range
NameOne_%Y%m-%d [-2,1]
NameTwo_%y%m%d [-3,1]
Desired result (Assuming that today's date is 2021-03-09 - 9 of march 2021):
Name
NameOne_202103-10
NameOne_202103-09
NameOne_202103-08
NameOne_202103-07
NameTwo_210310
NameTwo_210309
NameTwo_210308
NameTwo_210307
NameTwo_210306
I've been trying iterating over the dataframe and then generating the dates, but I still can't make it work..
for index, row in self.config_df.iterrows():
print(row['source'], row['date_range'])
days_sub=int(str(self.config_df["date_range"][0]).strip("[").strip("]").split(",")[0].strip())
days_add=int(str(self.config_df["date_range"][0]).strip("[").strip("]").split(",")[1].strip())
start_date = date.today() + timedelta(days=days_sub)
end_date = date.today() + timedelta(days=days_add)
date_range_df=pd.date_range(start=start_date, end=end_date)
date_range_df["source"]=row['source']
Any help is appreciated. Thanks!
Upvotes: 1
Views: 634
Reputation: 120391
Convert your date_range
from str
to list
with ast
module:
import ast
df = df.assign(date_range=df["date_range"].apply(ast.literal_eval)
Use date_range
to create list of dates and explode
to chain the list:
today = pd.Timestamp.today().normalize()
offset = pd.tseries.offsets.Day # shortcut
names = pd.Series([pd.date_range(today + offset(end),
today + offset(start),
freq="-1D").strftime(name)
for name, (start, end) in df.values]).explode(ignore_index=True)
>>> names
0 NameOne_202103-10
1 NameOne_202103-09
2 NameOne_202103-08
3 NameOne_202103-07
4 NameTwo_210310
5 NameTwo_210309
6 NameTwo_210308
7 NameTwo_210307
8 NameTwo_210306
dtype: object
Upvotes: 2
Reputation: 417
Alright. From your question I understand you have a starting data frame like so:
config_df = pd.DataFrame({
'name': ['NameOne_%Y-%m-%d', 'NameTwo_%y%m%d'],
'str_date_range': ['[-2,1]', '[-3,1]']})
Resulting in this:
name str_date_range
0 NameOne_%Y-%m-%d [-2,1]
1 NameTwo_%y%m%d [-3,1]
To achieve your goal and avoid iterating rows - which should be avoided using pandas - you can use groupby().apply()
like so:
def expand(row):
# Get the start_date and end_date from the row, by splitting
# the string and taking the first and last value respectively.
# .min() is required because row is technically a pd.Series
start_date = row.str_date_range.str.strip('[]').str.split(',').str[0].astype(int).min()
end_date = row.str_date_range.str.strip('[]').str.split(',').str[1].astype(int).min()
# Create a list range for from start_date to end_date.
# Note that range() does not include the end_date, therefor add 1
day_range = range(start_date, end_date+1)
# Create a Timedelta series from the day_range
days_diff = pd.to_timedelta(pd.Series(day_range), unit='days')
# Create an equally sized Series of today Timestamps
todays = pd.Series(pd.Timestamp.today()).repeat(len(day_range)-1).reset_index(drop=True)
df = todays.to_frame(name='date')
# Add days_diff to date column
df['date'] = df.date + days_diff
df['name'] = row.name
# Extract the date format from the name
date_format = row.name.split('_')[1]
# Add a column with the formatted date using the date_format string
df['date_str'] = df.date.dt.strftime(date_format=date_format)
df['name'] = df.name.str.split('_').str[0] + '_' + df.date_str
# Optional: drop columns
return df.drop(columns=['date'])
config_df.groupby('name').apply(expand).reset_index(drop=True)
returning:
name date_str
0 NameOne_2021-03-07 2021-03-07
1 NameOne_2021-03-08 2021-03-08
2 NameOne_2021-03-09 2021-03-09
3 NameTwo_210306 210306
4 NameTwo_210307 210307
5 NameTwo_210308 210308
6 NameTwo_210309 210309
Upvotes: 1