Data Modeler
Data Modeler

Reputation: 57

Causal Impact Analysis in Python (A/B Testing)

I am doing a causal impact analysis in Python. This kind of analysis helps in measuring the impact in the Treatment group post intervention when compared to a control group (A/B Testing). I read some literature from here: https://www.analytics-link.com/post/2017/11/03/causal-impact-analysis-in-r-and-now-python

Let's say my data is in following format:

enter image description here

The following code works perfectly:

from causalimpact import CausalImpact
cut_off_point = 12
pre_period = [0,cut_off_point-1]
post_period = [cut_off_point,data.shape[0]-1]
impact = CausalImpact(data, pre_period, post_period)
impact.run()
impact.plot()

However, if I add an additional column of Date and try to split the treatment and control groups based on date, I get an error

Say, I define the pre period and post periods by date now like this:

pre_period = ['2020-04-27','2020-06-29']
post_period = ['2020-07-06','2020-07-27']
impact = CausalImpact(data, pre_period, post_period)
impact.run()
impact.plot()

I get an error:

ConversionError: Failed to convert value(s) to axis units: '2020-06-29'

I have converted the date to index but still getting the error.

Can anyone please help. There seems to be limited literature online on this library and its usage in A/B Testing. Thank you so much for your help!

Upvotes: 0

Views: 4213

Answers (4)

Willian Fuks
Willian Fuks

Reputation: 11797

For those finding this question there's also the possibility of using the new tfcausalimpact library for running causal impact in Python (it was built on top of TensorFlow).

Here's an example to solve this problem on the new package:

dated_data = data.set_index(pd.date_range(start='20200101', periods=len(data)))

pre_period = ['20200101', '20200311']
post_period = ['20200312', '20200409']

ci = CausalImpact(dated_data, pre_period, post_period)

Notice that the package allows specifying the interval periods as strings as long as the index of the input data is of type pandas.index.datetime.

Upvotes: 0

Aaron dlC
Aaron dlC

Reputation: 47

Didn't work for me, it raises TypeError: float() argument must be a string or a number, not 'datetime.date' in a pretty equal dataset (one date column and control/test group columns) Doesnt seem a very general solution

Upvotes: 0

ipj
ipj

Reputation: 3598

Before passing periods to CausalImpact define periods:

pre_period = [pd.to_datetime(date) for date in  ['2020-04-27','2020-06-29']]
post_period = [pd.to_datetime(date) for date in ['2020-07-06','2020-07-27']]

Now periods are time-series objects, for example pre_period:

[Timestamp('2014-01-01 00:00:00'), Timestamp('2014-03-12 00:00:00')]

is a list of Timestamp. After that try:

impact = CausalImpact(data, pre_period, post_period)
impact.run()
impact.plot()

Upvotes: 1

Samuel
Samuel

Reputation: 3051

It looks like your data is a dataframe, but you are providing dates in the pre_period and post_period objects, which require your data be be a time series object instead. This is explained in the original R package documentation here.

To sum up: provide indices for dataframes, provide dates for time series.

Upvotes: 0

Related Questions