Reputation: 4687
given a list of indices that match a condition, where there will be many spans in the list that are sequentially adjacent, how can I easily select only the first of each span.
such that
magicallySelect([1,2,3,10,11,12,100,101,102]) == [1,10,100]
but -- importantly, this should also work for other indicies, like dates (which is the case in my data). The actual code I'm hoping to get working is:
original.reset_index(inplace=True)
predict = {}
for app in apps:
reg = linear_model.LinearRegression()
reg.fit(original.index.values.reshape(-1, 1), original[app].values)
slope = reg.coef_.tolist()[0]
delta = original[app].apply(lambda x: abs(slope - x))
forecast['test_delta'] = forecast[app].apply(lambda x: abs(slope - x))
tdm = forecast['test_delta'].mean()
tds = forecast['test_delta'].std(ddof=0)
# identify moments that are σ>2 abnormal
forecast['z'] = forecast['test_delta'].apply(lambda x: abs(x - tdm / tds))
sig = forecast.index[forecast[forecast['z'] > 2]].tolist()
predict[app] = FIRST_INDEX_IN_EACH_SPAN_OF(sig)
Upvotes: 0
Views: 37
Reputation: 4680
l = [1,2,3,10,11,12,100,101,102]
indices = [l[i] for i in range(len(l)) if l[i-1]!=l[i]-1]
Reordering this slightly to work for datetimes, this would give you all items in the list where the gap from the previous item is greater than 1 day (plus the first item by default):
indices = [l[0]] + [l[i] for i in range(len(l)) if (l[i]-l[i-1]).days>1]
For a difference in time measured in minutes, you can convert to seconds and substitute this in. E.g. for 15 minutes (900 seconds) you can do:
indices = [l[0]] + [l[i] for i in range(len(l)) if (l[i]-l[i-1]).seconds>900]
Upvotes: 1