Transition probabilities of a binned time series

Question

My data:

data
Out[84]: 
array([ 1.79,  1.93,  1.81,  1.79,  3.87,  5.75,  7.25,  5.03, 11.11,
       11.53, 13.79,  4.41,  4.68,  7.9 ,  3.49,  1.8 ,  1.85,  2.02,
        1.81,  2.33,  2.13,  1.92,  1.74,  1.84])

I defined three ranges/bins in which each element can be:

b = 3
binsize = (Max-Min)/b
d = {}

for i in range(0,b):
    upperlimit = Min + (i+1)*binsize
    d["Bin "+str(i)+" upper limit"] = upperlimit

d
Out[83]: 
{'Bin 0 upper limit': 5.756666666666667,
 'Bin 1 upper limit': 9.773333333333333,
 'Bin 2 upper limit': 13.790000000000001}

So Bin 0 goes from Min to 5.756 Bin 1 from 5.756 to 9.7733 Bin 2 from 9.7733 to 13.79

I would like to calculate the probability, given an element X, that X+1 will be in a certain bin. The transition probability from one bin to another so to say. How would I do that? I am struggling a bit to wrap my head around that.

So basically, at any point in time t (first element of the array is in t=0, last in t=23) I want to know what the transition probability of going from one bin to another is

piterbarg · Accepted Answer

Pandas has some methods that are useful here, including binning so you do not need to do it by hand. Hope you do not mind using pandas -- anyway it should give you an idea on how to do it if you want to do it "by hand"

let's start by putting your data into bins. Here pd.cut will split your data into 3 bins and return a bin for each point. we also put the original data into a dataframe as column 'x'

import pandas as pd

b = 3
bins = pd.cut(data, b, labels = False)
df = pd.DataFrame({'x':data})

Now let's put bins into df as 'from' column, indicating which bin each x point is in, and let's shift it to show where it is going to be at the next step

df['from'] = bins
df['to'] = df['from'].shift(-1)
df = df.dropna().astype(int)
df

Now your df looks like this:

      x    from    to
--  ---  ------  ----
 0    1       0     0
 1    1       0     0
 2    1       0     0
 3    1       0     0
 4    3       0     0
 5    5       0     1
 6    7       1     0
 7    5       0     2
 8   11       2     2
 9   11       2     2
10   13       2     0
11    4       0     0
12    4       0     1
13    7       1     0
14    3       0     0
15    1       0     0
16    1       0     0
17    2       0     0
18    1       0     0
19    2       0     0
20    2       0     0
21    1       0     0
22    1       0     0

now we can groupby the df on 'from' and 'to' and calculate how many times your process transitioned from a given from bin to a given to bin:

df.groupby(['from','to']).count().reset_index().rename(columns = {'x':'count'})

this looks like this:


   from to  count
0   0   0   15
1   0   1   2
2   0   2   1
3   1   0   2
4   2   0   1
5   2   2   2

eg your process transitioned from bin0 to bin1 2 times, etc

you can get this in a matrix form as well:

df.groupby(['from','to']).count().unstack(level = 1).fillna(0).astype(int)

it will look like this:


x
to  0   1   2
from            
0   15  2   1
1   2   0   0
2   1   0   2

Transition probabilities of a binned time series

Answers (1)

Related Questions