Reputation: 9
Specifically I have a time series of a disease passing through a horse population. What I want to do is create a frequency based not on time but on cases, such that the df maintains its current order but lists 1000 cases for each entry. If an entry is too high it will create a new row, and if too low it will merge with following row averaging the inputs based on the constituent case numbers so that I can get heteroscedasticity out of the data. I realize I could do this with a massive loop, but am wondering if there are any less computationally intensive apply methods that I can use to accomplish the same task. So in the example below Time 0 would create four new rows, the last of which having 699 entries would merge with the 230 from Time 1 plus 71 from Time 2 averaging their severity and states by the number of input cases.
Time Severity Cases States
0 4 3699 39
1 7 230 15
2 2 1300 27
3 3 740 13
4 2 3000 23
Upvotes: 0
Views: 61
Reputation: 9
For anyone in the future who finds this question. From what I can tell there's no apply that will transform the TS. Quang's answer works great, however as he pointed out it really works only with small TSs otherwise the whole thing crashes due to memory limitations. There are two solutions to skirt this, the first is to create a df, run the above script in fractions of the whole TS and append the results to the new df. The other option, so that it's not like molasses and doesn't spank memory so severely is to do the loops in Cython instead. You can find an example of the Cython Loops code here: https://github.com/Jackal08/financial-data-structures/blob/master/cython_loops.c
Upvotes: 0
Reputation: 150825
IIUC, this should do:
# this will blows up your dataframe by `Cases`,
# so be aware if you have lengthy data
new_df = (df.loc[df.index.repeat(df.Cases)]
.assign(Cases=1,
groups=lambda x: np.arange(len(x))//1000
)
.groupby('groups')
.agg({'Time':'first',
'Cases':'sum',
'Severity':'mean',
'States':'mean'})
)
Output:
Time Cases Severity States
groups
0 0 1000 4.000 39.000
1 0 1000 4.000 39.000
2 0 1000 4.000 39.000
3 0 1000 4.548 32.628
4 2 1000 2.000 27.000
5 2 1000 2.740 16.516
6 4 1000 2.000 23.000
7 4 1000 2.000 23.000
8 4 969 2.000 23.000
Upvotes: 1