Reputation: 2410
I have a dataframe below where seq_name
is the name of the video, pedestrian_id
is the name of the pedestrian in that video and frame_no
are the frames that contain said pedestrian in the video. Thus for the example below, I have 2 videos 0001
and 0002
and 3 total pedestrians.
I am trying to group them to assign them a global_id
based on this information such that each unique pedestrian is given an ID from 1 to total number of pedestrians.
seq_name label pedestrian_id frame_no global_id
0 0001 crossing 0001 0001.png -1
1 0001 crossing 0001 0002.png -1
2 0001 crossing 0001 0003.png -1
3 0001 crossing 0001 0004.png -1
4 0001 crossing 0001 0005.png -1
5 0001 crossing 0001 0006.png -1
6 0001 crossing 0001 0007.png -1
7 0001 crossing 0001 0008.png -1
8 0001 crossing 0001 0009.png -1
9 0001 crossing 0001 0010.png -1
10 0001 crossing 0002 0001.png -1
11 0001 crossing 0002 0002.png -1
12 0001 crossing 0002 0003.png -1
13 0001 crossing 0002 0004.png -1
14 0001 crossing 0002 0005.png -1
15 0001 crossing 0002 0006.png -1
16 0001 crossing 0002 0007.png -1
17 0001 crossing 0002 0008.png -1
18 0001 crossing 0002 0009.png -1
19 0001 crossing 0002 0010.png -1
20 0002 crossing 0001 0001.png -1
21 0002 crossing 0001 0002.png -1
22 0002 crossing 0001 0003.png -1
23 0002 crossing 0001 0004.png -1
24 0002 crossing 0001 0005.png -1
25 0002 crossing 0001 0006.png -1
26 0002 crossing 0001 0007.png -1
27 0002 crossing 0001 0008.png -1
28 0002 crossing 0001 0009.png -1
29 0002 crossing 0001 0010.png -1
The final dataframe should thus look like this
seq_name label pedestrian_id frame_no global_id
0 0001 crossing 0001 0001.png 1
1 0001 crossing 0001 0002.png 1
2 0001 crossing 0001 0003.png 1
3 0001 crossing 0001 0004.png 1
4 0001 crossing 0001 0005.png 1
5 0001 crossing 0001 0006.png 1
6 0001 crossing 0001 0007.png 1
7 0001 crossing 0001 0008.png 1
8 0001 crossing 0001 0009.png 1
9 0001 crossing 0001 0010.png 1
10 0001 crossing 0002 0001.png 2
11 0001 crossing 0002 0002.png 2
12 0001 crossing 0002 0003.png 2
13 0001 crossing 0002 0004.png 2
14 0001 crossing 0002 0005.png 2
15 0001 crossing 0002 0006.png 2
16 0001 crossing 0002 0007.png 2
17 0001 crossing 0002 0008.png 2
18 0001 crossing 0002 0009.png 2
19 0001 crossing 0002 0010.png 2
20 0002 crossing 0001 0001.png 3
21 0002 crossing 0001 0002.png 3
22 0002 crossing 0001 0003.png 3
23 0002 crossing 0001 0004.png 3
24 0002 crossing 0001 0005.png 3
25 0002 crossing 0001 0006.png 3
26 0002 crossing 0001 0007.png 3
27 0002 crossing 0001 0008.png 3
28 0002 crossing 0001 0009.png 3
29 0002 crossing 0001 0010.png 3
Looks very easy but I am new to pandas and am having a lot of trouble. The only thing I have done is to group them based on seq_name
and pedestrian_id
before appending a new column. But when I printed the data it no longer looks similar to the one Id like above plus it looks like global_id has been appended to a new row instead of a new column. Would like some help.
df = df.groupby(['seq_name','pedestrian_id']).agg(lambda x: x.tolist())
df["global_id"] = range(1, len(df)+1)
Upvotes: 2
Views: 70
Reputation: 13255
df['global_id'] = df.pedestrian_id.ne(df.pedestrian_id.shift()).cumsum()
print(df)
seq_name label pedestrian_id frame_no global_id
0 1 crossing 1 0001.png 1
1 1 crossing 1 0002.png 1
2 1 crossing 1 0003.png 1
3 1 crossing 1 0004.png 1
4 1 crossing 1 0005.png 1
5 1 crossing 1 0006.png 1
6 1 crossing 1 0007.png 1
7 1 crossing 1 0008.png 1
8 1 crossing 1 0009.png 1
9 1 crossing 1 0010.png 1
10 1 crossing 2 0001.png 2
11 1 crossing 2 0002.png 2
12 1 crossing 2 0003.png 2
13 1 crossing 2 0004.png 2
14 1 crossing 2 0005.png 2
15 1 crossing 2 0006.png 2
16 1 crossing 2 0007.png 2
17 1 crossing 2 0008.png 2
18 1 crossing 2 0009.png 2
19 1 crossing 2 0010.png 2
20 2 crossing 1 0001.png 3
21 2 crossing 1 0002.png 3
22 2 crossing 1 0003.png 3
23 2 crossing 1 0004.png 3
24 2 crossing 1 0005.png 3
25 2 crossing 1 0006.png 3
26 2 crossing 1 0007.png 3
27 2 crossing 1 0008.png 3
28 2 crossing 1 0009.png 3
29 2 crossing 1 0010.png 3
Upvotes: 3
Reputation: 862481
Use GroupBy.ngroup
:
df["global_id"] = df.groupby(['seq_name','pedestrian_id']).ngroup() + 1
Another solutions with factorize
:
s = df['seq_name'].astype(str) +'_' +df['pedestrian_id'].astype(str)
s = list(map(tuple, df[['seq_name','pedestrian_id']].values.tolist()))
df["global_id"] = pd.factorize(s)[0] + 1
print (df)
seq_name label pedestrian_id frame_no global_id
0 1 crossing 1 0001.png 1
1 1 crossing 1 0002.png 1
2 1 crossing 1 0003.png 1
3 1 crossing 1 0004.png 1
4 1 crossing 1 0005.png 1
5 1 crossing 1 0006.png 1
6 1 crossing 1 0007.png 1
7 1 crossing 1 0008.png 1
8 1 crossing 1 0009.png 1
9 1 crossing 1 0010.png 1
10 1 crossing 2 0001.png 2
11 1 crossing 2 0002.png 2
12 1 crossing 2 0003.png 2
13 1 crossing 2 0004.png 2
14 1 crossing 2 0005.png 2
15 1 crossing 2 0006.png 2
16 1 crossing 2 0007.png 2
17 1 crossing 2 0008.png 2
18 1 crossing 2 0009.png 2
19 1 crossing 2 0010.png 2
20 2 crossing 1 0001.png 3
21 2 crossing 1 0002.png 3
22 2 crossing 1 0003.png 3
23 2 crossing 1 0004.png 3
24 2 crossing 1 0005.png 3
25 2 crossing 1 0006.png 3
26 2 crossing 1 0007.png 3
27 2 crossing 1 0008.png 3
28 2 crossing 1 0009.png 3
29 2 crossing 1 0010.png 3
Upvotes: 3