Reputation: 197
My data:
data
Out[84]:
array([ 1.79, 1.93, 1.81, 1.79, 3.87, 5.75, 7.25, 5.03, 11.11,
11.53, 13.79, 4.41, 4.68, 7.9 , 3.49, 1.8 , 1.85, 2.02,
1.81, 2.33, 2.13, 1.92, 1.74, 1.84])
I defined three ranges/bins in which each element can be:
b = 3
binsize = (Max-Min)/b
d = {}
for i in range(0,b):
upperlimit = Min + (i+1)*binsize
d["Bin "+str(i)+" upper limit"] = upperlimit
d
Out[83]:
{'Bin 0 upper limit': 5.756666666666667,
'Bin 1 upper limit': 9.773333333333333,
'Bin 2 upper limit': 13.790000000000001}
So Bin 0 goes from Min to 5.756 Bin 1 from 5.756 to 9.7733 Bin 2 from 9.7733 to 13.79
I would like to calculate the probability, given an element X, that X+1 will be in a certain bin. The transition probability from one bin to another so to say. How would I do that? I am struggling a bit to wrap my head around that.
So basically, at any point in time t (first element of the array is in t=0, last in t=23) I want to know what the transition probability of going from one bin to another is
Upvotes: 2
Views: 228
Reputation: 8219
Pandas has some methods that are useful here, including binning so you do not need to do it by hand. Hope you do not mind using pandas -- anyway it should give you an idea on how to do it if you want to do it "by hand"
let's start by putting your data into bins. Here pd.cut
will split your data into 3 bins and return a bin for each point. we also put the original data into a dataframe as column 'x'
import pandas as pd
b = 3
bins = pd.cut(data, b, labels = False)
df = pd.DataFrame({'x':data})
Now let's put bins
into df
as 'from' column, indicating which bin each x
point is in, and let's shift
it to show where it is going to be at the next step
df['from'] = bins
df['to'] = df['from'].shift(-1)
df = df.dropna().astype(int)
df
Now your df looks like this:
x from to
-- --- ------ ----
0 1 0 0
1 1 0 0
2 1 0 0
3 1 0 0
4 3 0 0
5 5 0 1
6 7 1 0
7 5 0 2
8 11 2 2
9 11 2 2
10 13 2 0
11 4 0 0
12 4 0 1
13 7 1 0
14 3 0 0
15 1 0 0
16 1 0 0
17 2 0 0
18 1 0 0
19 2 0 0
20 2 0 0
21 1 0 0
22 1 0 0
now we can groupby
the df
on 'from' and 'to' and calculate how many times your process transitioned from a given from
bin to a given to
bin:
df.groupby(['from','to']).count().reset_index().rename(columns = {'x':'count'})
this looks like this:
from to count
0 0 0 15
1 0 1 2
2 0 2 1
3 1 0 2
4 2 0 1
5 2 2 2
eg your process transitioned from bin0 to bin1 2 times, etc
you can get this in a matrix form as well:
df.groupby(['from','to']).count().unstack(level = 1).fillna(0).astype(int)
it will look like this:
x
to 0 1 2
from
0 15 2 1
1 2 0 0
2 1 0 2
Upvotes: 3