Reputation: 57
I am doing a causal impact analysis in Python. This kind of analysis helps in measuring the impact in the Treatment group post intervention when compared to a control group (A/B Testing). I read some literature from here: https://www.analytics-link.com/post/2017/11/03/causal-impact-analysis-in-r-and-now-python
Let's say my data is in following format:
The following code works perfectly:
from causalimpact import CausalImpact
cut_off_point = 12
pre_period = [0,cut_off_point-1]
post_period = [cut_off_point,data.shape[0]-1]
impact = CausalImpact(data, pre_period, post_period)
impact.run()
impact.plot()
However, if I add an additional column of Date and try to split the treatment and control groups based on date, I get an error
Say, I define the pre period and post periods by date now like this:
pre_period = ['2020-04-27','2020-06-29']
post_period = ['2020-07-06','2020-07-27']
impact = CausalImpact(data, pre_period, post_period)
impact.run()
impact.plot()
I get an error:
ConversionError: Failed to convert value(s) to axis units: '2020-06-29'
I have converted the date to index but still getting the error.
Can anyone please help. There seems to be limited literature online on this library and its usage in A/B Testing. Thank you so much for your help!
Upvotes: 0
Views: 4213
Reputation: 11797
For those finding this question there's also the possibility of using the new tfcausalimpact library for running causal impact in Python (it was built on top of TensorFlow).
Here's an example to solve this problem on the new package:
dated_data = data.set_index(pd.date_range(start='20200101', periods=len(data)))
pre_period = ['20200101', '20200311']
post_period = ['20200312', '20200409']
ci = CausalImpact(dated_data, pre_period, post_period)
Notice that the package allows specifying the interval periods as strings
as long as the index of the input data is of type pandas.index.datetime
.
Upvotes: 0
Reputation: 47
Didn't work for me, it raises TypeError: float() argument must be a string or a number, not 'datetime.date' in a pretty equal dataset (one date column and control/test group columns) Doesnt seem a very general solution
Upvotes: 0
Reputation: 3598
Before passing periods to CausalImpact
define periods:
pre_period = [pd.to_datetime(date) for date in ['2020-04-27','2020-06-29']]
post_period = [pd.to_datetime(date) for date in ['2020-07-06','2020-07-27']]
Now periods are time-series objects, for example pre_period
:
[Timestamp('2014-01-01 00:00:00'), Timestamp('2014-03-12 00:00:00')]
is a list of Timestamp
.
After that try:
impact = CausalImpact(data, pre_period, post_period)
impact.run()
impact.plot()
Upvotes: 1
Reputation: 3051
It looks like your data is a dataframe, but you are providing dates in the pre_period
and post_period
objects, which require your data be be a time series object instead. This is explained in the original R package documentation here.
To sum up: provide indices for dataframes, provide dates for time series.
Upvotes: 0