M A
M A

Reputation: 45

Find the occurence of events

I have an array with five different events, each event occurs for different intervals more than one time.

Ex:.

array(['walking', 'walking', 'walking', 'walking', 'Running', 'Running',
       'Running', 'Running', 'walking', 'walking', 'walking', 'walking',
       'walking', 'Standing', 'Standing', 'Standing', 'walking', 'walking',
       'walking'], dtype='<U8')

.... (3245 long)

I want to extract an array for each event that indicates the intervals for each event.

The results should be as the following for the example above:

Walking_occurence = [
(0,3),
(8,12),
(16,18)
]

Upvotes: 4

Views: 77

Answers (2)

Valdi_Bo
Valdi_Bo

Reputation: 30971

I took your list of activities as a plain Python list:

act = ['walking', 'walking', 'walking', 'walking', 'running', 'running',
    'running', 'running', 'walking', 'walking', 'walking', 'walking', 'walking',
    'standing', 'standing', 'standing', 'walking', 'walking', 'walking']

Then the steps to perform are as follows:

  1. import itertools (will be needed soon).

  2. Create a DataFrame from act:

    df = pd.Series(act).to_frame(name='activity')
    
  3. Generate data to an auxiliary DataFrame:

    rows = []
    for k, g in itertools.groupby(df.itertuples(name='row'), lambda row: row.activity):
        grp = list(g)
        rows.append([(grp[0].Index, grp[-1].Index), k])
    

    Note that itertools.groupby differs from Pandas version of groupby in one detail: Each change in the key of the source element opens a new group.

    So the result is:

    [[(0, 3), 'walking'],
     [(4, 7), 'running'],
     [(8, 12), 'walking'],
     [(13, 15), 'standing'],
     [(16, 18), 'walking']]
    
  4. Create the auxiliary DataFrame:

    df2 = pd.DataFrame(rows, columns=['id', 'activity'])
    
  5. Generate the final result:

    df2.groupby('activity').id.agg(list)
    

The result is:

activity
running                        [(4, 7)]
standing                     [(13, 15)]
walking     [(0, 3), (8, 12), (16, 18)]
Name: id, dtype: object

E.g. for walking - one list of (from, to) tuples, just as you want.

Upvotes: 1

Chris Adams
Chris Adams

Reputation: 18647

Here is a potential approach using pandas.Series with cumsum and groupby:

import pandas as pd

a = np.array(['walking', 'walking', 'walking', 'walking', 'Running',
              'Running', 'Running', 'Running', 'walking', 'walking',
              'walking', 'walking', 'walking', 'Standing', 'Standing',
              'Standing', 'walking', 'walking', 'walking'])

s = pd.Series(a)

s_out = ((s != s.shift()).cumsum().reset_index()
          .groupby([0, s])['index']
          .agg(['min', 'max'])
          .apply(tuple, axis=1))

# print(s_out)
# 1  walking       (0, 3)
# 2  Running       (4, 7)
# 3  walking      (8, 12)
# 4  Standing    (13, 15)
# 5  walking     (16, 18)

You could then do a further groupby opperation to get your desired results:

s_out = s_out.groupby(level=1, sort=False).apply(np.array)

[out]

walking     [(0, 3), (8, 12), (16, 18)]
Running                        [(4, 7)]
Standing                     [(13, 15)]
dtype: object

Upvotes: 5

Related Questions