time series sliding window with occurrence counts

Question

I am trying to get a count between two timestamped values:

for example:

time    letter
  1     A
  4     B
  5     C
  9     C
  18    B
  30    A
  30    B

I am dividing time to time windows: 1+ 30 / 30 then I want to know how many A B C in each time window of size 1

timeseries  A  B  C
1           1  0  0
2           0  0  0
...
30          1  1  0

this shoud give me a table of 30 rows and 3 columns: A B C of ocurancess

The problem is the data is taking to long to be break down because it iterates through all master table every time to slice the data eventhough thd data is already sorted

master = mytable  

minimum = master.timestamp.min()
maximum = master.timestamp.max()

window = (minimum + maximum) / maximum

wstart = minimum
wend = minimum + window

concurrent_tasks = []

while ( wstart <= maximum ):
    As = 0
    Bs = 0
    Cs = 0
    for d, row in master.iterrows():
        ttime = row.timestamp
        if ((ttime >= wstart) & (ttime < wend)):
            #print (row.channel)
            if (row.channel == 'A'):
                As = As + 1
            elif (row.channel == 'B'):
                Bs = Bs + 1
            elif (row.channel == 'C'):
                Cs = Cs + 1


    concurrent_tasks.append([m_id, As, Bs, Cs])

    wstart = wstart + window
    wend = wend + window

Could you help me in making this perform better ? i want to use map function and i want to prevent python from looping through all the loop every time.

This is part of big data and it taking days to finish ?

thank you

time series sliding window with occurrence counts

Answers (1)

Related Questions