Rod0n
Rod0n

Reputation: 1079

Fill dates with last value until current time

I've a dataframe with the following structure:

Time Company Product_type Total_sales
2021-01-31 06:00:00+00:00 Adidas Shoes 20
2021-01-31 05:00:00+00:00 Adidas Shoes 13
2021-01-31 03:00:00+00:00 Adidas Shoes 4
2021-01-31 03:00:00+00:00 Nike Shoes 5
2021-01-31 02:00:00+00:00 Adidas Shoes 3
2021-01-31 02:00:00+00:00 Nike Shoes 3

What I needed was to "fill" the time_hour gaps with the nearest previous value (time) according to their company and product_type. In this case, for Adidas, a row for the 04:00 is missing so it'll need to be filled with 4, the value from 03:00 sales.

That was solved in the following link so I now have this:

Time Company Product_type Total_sales
2021-01-31 06:00:00+00:00 Adidas Shoes 20
2021-01-31 05:00:00+00:00 Adidas Shoes 13
2021-01-31 04:00:00+00:00 Adidas Shoes 4
2021-01-31 03:00:00+00:00 Adidas Shoes 4
2021-01-31 03:00:00+00:00 Nike Shoes 5
2021-01-31 02:00:00+00:00 Adidas Shoes 3
2021-01-31 02:00:00+00:00 Nike Shoes 3

Now what I want to do is to fill it with the dates until the current time. For example today is 2021-02-05 so I need all the hours (rows) between 2021-01-31 06:00:00 and 2021-02-05 to be added with "20" for Adidas Shoes as the last total_sales value says.

I've thought about simply adding a row with the last value and the current date but it seems it won't work and would add a lot of possible bugs.

Upvotes: 1

Views: 731

Answers (2)

mullinscr
mullinscr

Reputation: 1738

This method will extend the dataframe to today's date and then fill in the relevant data up to this point.

It does this by first creating a new dataframe that consists of today's date and the companies and their products from the existing dataframe. Then it appends this new dataframe to the existing dataframe. Now when you run the code from the other answer you should get your desired result.

import datetime

import pandas as pd
import numpy as np

# create the dataframe with today's date
# and the existing companies and their products
today_df = pd.DataFrame({'Time' : [datetime.datetime.now()] * company_products.shape[0],
                  'Company': company_products['Company'],
                  'Product_type': company_products['Product_type']})

# append this to the original dataframe
df = df.append(today_df)
# ensure the 'Time' column is a datetime
df['Time'] = pd.to_datetime(df['Time'], utc=True)


# run the other answer's code
resmapled = df.groupby(['Company', 'Product_type']).resample(on='Time', rule='1H').sum().reset_index()

result = resampled.sort_values(['Company', 'Product_type', 'Time']).replace(0,np.nan).ffill()

Note: You could change the datetime.datetime.now() part of the code to be a specific date - '2021-02-28' for example.

Upvotes: 1

r-beginners
r-beginners

Reputation: 35115

The code I've created is limited to Adidas, but you can combine Nike in the same way. The main point is to update the original dataframe by creating a pd.datetime_range() from the last data of the real data to today. At that time, we will fill in the holes backward.

df['Time'] = pd.to_datetime(df['Time'])
df.set_index('Time', inplace=True)
df_adidas = df.loc[df['Company'] == 'Adidas']
tidx = pd.date_range(df_adidas.head(1).index.values[0], datetime.date.today(), freq='1H', tz='UTC')
df_adidas.reindex(tidx, method='bfill')

    Company     Product_type    Total_sales
2021-01-31 06:00:00+00:00   Adidas  Shoes   20
2021-01-31 07:00:00+00:00   Adidas  Shoes   20
2021-01-31 08:00:00+00:00   Adidas  Shoes   20
2021-01-31 09:00:00+00:00   Adidas  Shoes   20
2021-01-31 10:00:00+00:00   Adidas  Shoes   20
...     ...     ...     ...
2021-02-05 20:00:00+00:00   Adidas  Shoes   20
2021-02-05 21:00:00+00:00   Adidas  Shoes   20
2021-02-05 22:00:00+00:00   Adidas  Shoes   20
2021-02-05 23:00:00+00:00   Adidas  Shoes   20
2021-02-06 00:00:00+00:00   Adidas  Shoes   20

Upvotes: 1

Related Questions