roberto tomás
roberto tomás

Reputation: 4687

how to get first indices in dataframe match

given a list of indices that match a condition, where there will be many spans in the list that are sequentially adjacent, how can I easily select only the first of each span.

such that

magicallySelect([1,2,3,10,11,12,100,101,102]) == [1,10,100]

but -- importantly, this should also work for other indicies, like dates (which is the case in my data). The actual code I'm hoping to get working is:

original.reset_index(inplace=True)

predict = {}
for app in apps:
    reg = linear_model.LinearRegression()
    reg.fit(original.index.values.reshape(-1, 1), original[app].values)

    slope = reg.coef_.tolist()[0]
    delta = original[app].apply(lambda x: abs(slope - x))

    forecast['test_delta'] = forecast[app].apply(lambda x: abs(slope - x))
    tdm = forecast['test_delta'].mean()
    tds = forecast['test_delta'].std(ddof=0)

    # identify moments that are σ>2 abnormal
    forecast['z'] = forecast['test_delta'].apply(lambda x: abs(x - tdm / tds))
    sig = forecast.index[forecast[forecast['z'] > 2]].tolist()

    predict[app] = FIRST_INDEX_IN_EACH_SPAN_OF(sig)

Upvotes: 0

Views: 37

Answers (1)

Toby Petty
Toby Petty

Reputation: 4680

l = [1,2,3,10,11,12,100,101,102]
indices =  [l[i] for i in range(len(l)) if l[i-1]!=l[i]-1]

Reordering this slightly to work for datetimes, this would give you all items in the list where the gap from the previous item is greater than 1 day (plus the first item by default):

indices = [l[0]] + [l[i] for i in range(len(l)) if (l[i]-l[i-1]).days>1]

For a difference in time measured in minutes, you can convert to seconds and substitute this in. E.g. for 15 minutes (900 seconds) you can do:

indices = [l[0]] + [l[i] for i in range(len(l)) if (l[i]-l[i-1]).seconds>900]

Upvotes: 1

Related Questions