ssurendr
ssurendr

Reputation: 456

Python/Pandas - Preparing Source Data with Weekly Columns to Time Series

I tried to google a question like this: How to transform weekly data for time series analysis in Pandas?

This question is hard to search without results that talk straight about re-sampling data from daily to weekly or something along those lines.

My question is really more to do with source data already in the form of weekly numerical data, but no time or date data like a datetime stamp.

Here is the form: (Please use the vertical bars for logical alignment of each row.)

Unique_Entity(string) | WK1(float64) | WK2(float64) | WK3(float64)| ...
UE1 | 123 | 234 | 345 | ...
UE2 | 456 | 567 | 678 | ...
UE3 | 789 | 890 | 901 | ...
... | ... | ... | ... | ...

Also WK1 is a "dynamic" description to indicate the numerical data is last week, WK2 is two weeks ago, WK3 is three weeks ago, and so on. So next week WK1's data will shift to WK2 and new data will be added to WK1. Hope that makes sense from my description.

With this being the source data format, I'd like to analyze this live data using time series tools provided by pandas and other python modules. A lot of them use an explicit date column to get their claws in for the rest of the analysis.

Wrap-Up Question: How do I transform or prepare my source data so that these tools can be easily used? (Apart from my naive solution below.)

Naive Solution: I could tag the date of the Monday (Or Friday) every week going backwards. (A function that uses today's date to then generate the dates of every Monday (Or Friday) going back.) Then I could point those time series tools to use those dates and re-sample as weeks.

This is assuming I've un-pivoted the horizontal headers so that WK1 will join with last Monday's (Or Friday's) date and so forth.

Upvotes: 0

Views: 2597

Answers (1)

John Zwinck
John Zwinck

Reputation: 249163

Create a DatetimeIndex ending today, with 1 week period in reverse, and assign it to the columns:

df.columns = pd.date_range(end=datetime.date.today(), periods=len(df.columns),
                           freq='1W-MON')[::-1]

It gives:

     2019-06-10  2019-06-03  2019-05-27
UE1         123         234         345
UE2         456         567         678
UE3         789         890         901

Transpose the result if needed.

Upvotes: 2

Related Questions