Reputation: 45
I have an array with five different events, each event occurs for different intervals more than one time.
Ex:.
array(['walking', 'walking', 'walking', 'walking', 'Running', 'Running',
'Running', 'Running', 'walking', 'walking', 'walking', 'walking',
'walking', 'Standing', 'Standing', 'Standing', 'walking', 'walking',
'walking'], dtype='<U8')
.... (3245 long)
I want to extract an array for each event that indicates the intervals for each event.
The results should be as the following for the example above:
Walking_occurence = [
(0,3),
(8,12),
(16,18)
]
Upvotes: 4
Views: 77
Reputation: 30971
I took your list of activities as a plain Python list:
act = ['walking', 'walking', 'walking', 'walking', 'running', 'running',
'running', 'running', 'walking', 'walking', 'walking', 'walking', 'walking',
'standing', 'standing', 'standing', 'walking', 'walking', 'walking']
Then the steps to perform are as follows:
import itertools
(will be needed soon).
Create a DataFrame from act:
df = pd.Series(act).to_frame(name='activity')
Generate data to an auxiliary DataFrame:
rows = []
for k, g in itertools.groupby(df.itertuples(name='row'), lambda row: row.activity):
grp = list(g)
rows.append([(grp[0].Index, grp[-1].Index), k])
Note that itertools.groupby differs from Pandas version of groupby in one detail: Each change in the key of the source element opens a new group.
So the result is:
[[(0, 3), 'walking'],
[(4, 7), 'running'],
[(8, 12), 'walking'],
[(13, 15), 'standing'],
[(16, 18), 'walking']]
Create the auxiliary DataFrame:
df2 = pd.DataFrame(rows, columns=['id', 'activity'])
Generate the final result:
df2.groupby('activity').id.agg(list)
The result is:
activity
running [(4, 7)]
standing [(13, 15)]
walking [(0, 3), (8, 12), (16, 18)]
Name: id, dtype: object
E.g. for walking - one list of (from, to) tuples, just as you want.
Upvotes: 1
Reputation: 18647
Here is a potential approach using pandas.Series
with cumsum
and groupby
:
import pandas as pd
a = np.array(['walking', 'walking', 'walking', 'walking', 'Running',
'Running', 'Running', 'Running', 'walking', 'walking',
'walking', 'walking', 'walking', 'Standing', 'Standing',
'Standing', 'walking', 'walking', 'walking'])
s = pd.Series(a)
s_out = ((s != s.shift()).cumsum().reset_index()
.groupby([0, s])['index']
.agg(['min', 'max'])
.apply(tuple, axis=1))
# print(s_out)
# 1 walking (0, 3)
# 2 Running (4, 7)
# 3 walking (8, 12)
# 4 Standing (13, 15)
# 5 walking (16, 18)
You could then do a further groupby opperation to get your desired results:
s_out = s_out.groupby(level=1, sort=False).apply(np.array)
[out]
walking [(0, 3), (8, 12), (16, 18)]
Running [(4, 7)]
Standing [(13, 15)]
dtype: object
Upvotes: 5