Reputation: 1
I have a data frame, which is the annotation of animal behavior recording frame by frame. But due to the limitation of the annotation software, the exported data frame is like this:
Frame stop Behavior
Frame start
0 2 turning
2 5 immobility
5 9 mild movement
9 11 immobility
16 20 jump
Code:
data = {
'Frame start': [0, 2, 5, 9, 16],
'Frame stop': [2, 5, 9, 11, 20],
'Behavior': ['turning', 'immobility', 'mild movement', 'immobility', 'jump']}
df = pd.DataFrame(data)
df.set_index('Frame start', inplace=True)
I want to modify the data frame to be like this:
Frame start
0 turning
1 turning
2 immobility
3 immobility
4 immobility
5 mild movement
6 mild movement
7 mild movement
8 mild movement
9 immobility
10 immobility
11 NaN
12 NaN
13 NaN
14 NaN
15 NaN
16 jump
17 jump
18 jump
19 jump
How can I do that?
Since I did not annotate every frame of a video, there are gaps of frames with no values, i.e. between index 4 and 5, frame 11-15 in this example. Therefore, I cannot re-index the data frame using the "frame start" column and simply <ffill> to fill up the missing values, which will look like the following:
new_index = [*range(0, 20)]
new_df = df.reindex(new_index, method='ffill')
final_df = new_df.loc[:, 'Behavior']
print(final_df)
Frame start
0 turning
1 turning
2 immobility
3 immobility
4 immobility
5 mild movement
6 mild movement
7 mild movement
8 mild movement
9 immobility
10 immobility
11 immobility
12 immobility
13 immobility
14 immobility
15 immobility
16 jump
17 jump
18 jump
19 jump
Upvotes: 0
Views: 84
Reputation: 13257
Use IntervalIndex
idx = pd.IntervalIndex.from_arrays(df.index, df['Frame stop'], closed='left')
val = range(df['Frame stop'].max()) # or set manually val = range(20)
out = pd.Series(val).map(df['Behavior'].set_axis(idx))
out
0 turning
1 turning
2 immobility
3 immobility
4 immobility
5 mild movement
6 mild movement
7 mild movement
8 mild movement
9 immobility
10 immobility
11 NaN
12 NaN
13 NaN
14 NaN
15 NaN
16 jump
17 jump
18 jump
19 jump
Upvotes: 1
Reputation: 25393
Another possible solution, which uses:
np.arange
to generate an array frames
that ranges from the first index of df
to the last value in the Frame stop
column.
np.searchsorted
to find the indices idx
in df.index
where the values in frames
should be inserted to maintain order, adjusted by subtracting 1.
np.where
to check if each value in frames
is less than the corresponding value in the Frame stop
column of df
, indexed by idx
. If true, it selects the corresponding value from the Behavior
column; otherwise, it assigns NaN
.
pd.Series
to create a new series with the resulting values, using frames
as the index and naming the series Behavior
.
pd.Series(
np.where(
(frames := np.arange(df.index[0], df['Frame stop'].iloc[-1]))
< df['Frame stop'].iloc[
(idx := np.searchsorted(df.index, frames, side='right') - 1)],
df['Behavior'].iloc[idx],
np.nan),
index=frames,
name='Behavior')
Output:
0 turning
1 turning
2 immobility
3 immobility
4 immobility
5 mild movement
6 mild movement
7 mild movement
8 mild movement
9 immobility
10 immobility
11 NaN
12 NaN
13 NaN
14 NaN
15 NaN
16 jump
17 jump
18 jump
19 jump
Name: Behavior, dtype: object
Upvotes: 0
Reputation: 101
Because you are forward-filling the Null values in Behavior
and Frame stop
after reindexing on a continuous Frame start
, you only need to Null-out the Behavior
values when the Frame stop
is less than or equal to the Frame start
index. You could do that with numpy.where()
or pandas df.loc
to overwrite Behavior
:
import numpy as np
# you could use numpy.where for overwriting Behavior, or .loc
#new_df['Behavior'] = np.where(new_df['Frame stop'] <= new_df.index, np.nan, new_df['Behavior'])
new_df.loc[new_df['Frame stop'] <= new_df.index, 'Behavior'] = np.nan
print(new_df)
Frame stop Behavior
Frame start
0 2 turning
1 2 turning
2 5 immobility
3 5 immobility
4 5 immobility
5 9 mild movement
6 9 mild movement
7 9 mild movement
8 9 mild movement
9 11 immobility
10 11 immobility
11 11 NaN
12 11 NaN
13 11 NaN
14 11 NaN
15 11 NaN
16 20 jump
17 20 jump
18 20 jump
19 20 jump
Upvotes: 1