Cristian Avendaño
Cristian Avendaño

Reputation: 477

Expand values with dates in a pandas dataframe

I have a dataframe with name values and a date range (start/end). I need to expand/replace the dates with the ones generated by the from/to index. How can I do this?

Name                    date_range
NameOne_%Y%m-%d         [-2,1]
NameTwo_%y%m%d          [-3,1]        

Desired result (Assuming that today's date is 2021-03-09 - 9 of march 2021):

Name 
NameOne_202103-10 
NameOne_202103-09
NameOne_202103-08
NameOne_202103-07
NameTwo_210310
NameTwo_210309
NameTwo_210308
NameTwo_210307
NameTwo_210306

I've been trying iterating over the dataframe and then generating the dates, but I still can't make it work..

for index, row in self.config_df.iterrows():
    print(row['source'], row['date_range'])
    days_sub=int(str(self.config_df["date_range"][0]).strip("[").strip("]").split(",")[0].strip())
    days_add=int(str(self.config_df["date_range"][0]).strip("[").strip("]").split(",")[1].strip())
    start_date = date.today() + timedelta(days=days_sub)
    end_date = date.today() + timedelta(days=days_add)
    
    date_range_df=pd.date_range(start=start_date, end=end_date)
    date_range_df["source"]=row['source']

Any help is appreciated. Thanks!

Upvotes: 1

Views: 634

Answers (2)

Corralien
Corralien

Reputation: 120391

Convert your date_range from str to list with ast module:

import ast
df = df.assign(date_range=df["date_range"].apply(ast.literal_eval)

Use date_range to create list of dates and explode to chain the list:

today = pd.Timestamp.today().normalize()
offset = pd.tseries.offsets.Day  # shortcut

names = pd.Series([pd.date_range(today + offset(end),
                                 today + offset(start),
                                 freq="-1D").strftime(name)
            for name, (start, end) in df.values]).explode(ignore_index=True)
>>> names
0    NameOne_202103-10
1    NameOne_202103-09
2    NameOne_202103-08
3    NameOne_202103-07
4       NameTwo_210310
5       NameTwo_210309
6       NameTwo_210308
7       NameTwo_210307
8       NameTwo_210306
dtype: object

Upvotes: 2

nehalem
nehalem

Reputation: 417

Alright. From your question I understand you have a starting data frame like so:

config_df = pd.DataFrame({
    'name': ['NameOne_%Y-%m-%d', 'NameTwo_%y%m%d'],
    'str_date_range': ['[-2,1]', '[-3,1]']})

Resulting in this:

               name str_date_range
0  NameOne_%Y-%m-%d         [-2,1]
1    NameTwo_%y%m%d         [-3,1]

To achieve your goal and avoid iterating rows - which should be avoided using pandas - you can use groupby().apply() like so:

def expand(row):
    # Get the start_date and end_date from the row, by splitting 
    # the string and taking the first and last value respectively.
    # .min() is required because row is technically a pd.Series
    start_date = row.str_date_range.str.strip('[]').str.split(',').str[0].astype(int).min()
    end_date = row.str_date_range.str.strip('[]').str.split(',').str[1].astype(int).min()
    # Create a list range for from start_date to end_date. 
    # Note that range() does not include the end_date, therefor add 1
    day_range = range(start_date, end_date+1)
    # Create a Timedelta series from the day_range
    days_diff = pd.to_timedelta(pd.Series(day_range), unit='days')
    # Create an equally sized Series of today Timestamps
    todays = pd.Series(pd.Timestamp.today()).repeat(len(day_range)-1).reset_index(drop=True)
    df = todays.to_frame(name='date')
    # Add days_diff to date column
    df['date'] = df.date + days_diff
    df['name'] = row.name
    # Extract the date format from the name
    date_format = row.name.split('_')[1]
    # Add a column with the formatted date using the date_format string
    df['date_str'] = df.date.dt.strftime(date_format=date_format)
    df['name'] = df.name.str.split('_').str[0] + '_' + df.date_str
    # Optional: drop columns
    return df.drop(columns=['date'])

config_df.groupby('name').apply(expand).reset_index(drop=True)

returning:

                 name    date_str
0  NameOne_2021-03-07  2021-03-07
1  NameOne_2021-03-08  2021-03-08
2  NameOne_2021-03-09  2021-03-09
3      NameTwo_210306      210306
4      NameTwo_210307      210307
5      NameTwo_210308      210308
6      NameTwo_210309      210309

Upvotes: 1

Related Questions