Kong
Kong

Reputation: 2410

Pandas - groupby on the concatenation of 2 columns

I have a dataframe below where seq_name is the name of the video, pedestrian_id is the name of the pedestrian in that video and frame_no are the frames that contain said pedestrian in the video. Thus for the example below, I have 2 videos 0001 and 0002 and 3 total pedestrians.

I am trying to group them to assign them a global_id based on this information such that each unique pedestrian is given an ID from 1 to total number of pedestrians.

       seq_name     label pedestrian_id  frame_no  global_id
0          0001  crossing          0001  0001.png         -1
1          0001  crossing          0001  0002.png         -1
2          0001  crossing          0001  0003.png         -1
3          0001  crossing          0001  0004.png         -1
4          0001  crossing          0001  0005.png         -1
5          0001  crossing          0001  0006.png         -1
6          0001  crossing          0001  0007.png         -1
7          0001  crossing          0001  0008.png         -1
8          0001  crossing          0001  0009.png         -1
9          0001  crossing          0001  0010.png         -1
10         0001  crossing          0002  0001.png         -1
11         0001  crossing          0002  0002.png         -1
12         0001  crossing          0002  0003.png         -1
13         0001  crossing          0002  0004.png         -1
14         0001  crossing          0002  0005.png         -1
15         0001  crossing          0002  0006.png         -1
16         0001  crossing          0002  0007.png         -1
17         0001  crossing          0002  0008.png         -1
18         0001  crossing          0002  0009.png         -1
19         0001  crossing          0002  0010.png         -1
20         0002  crossing          0001  0001.png         -1
21         0002  crossing          0001  0002.png         -1
22         0002  crossing          0001  0003.png         -1
23         0002  crossing          0001  0004.png         -1
24         0002  crossing          0001  0005.png         -1
25         0002  crossing          0001  0006.png         -1
26         0002  crossing          0001  0007.png         -1
27         0002  crossing          0001  0008.png         -1
28         0002  crossing          0001  0009.png         -1
29         0002  crossing          0001  0010.png         -1

The final dataframe should thus look like this

       seq_name     label pedestrian_id  frame_no  global_id
0          0001  crossing          0001  0001.png         1
1          0001  crossing          0001  0002.png         1
2          0001  crossing          0001  0003.png         1
3          0001  crossing          0001  0004.png         1
4          0001  crossing          0001  0005.png         1
5          0001  crossing          0001  0006.png         1
6          0001  crossing          0001  0007.png         1
7          0001  crossing          0001  0008.png         1
8          0001  crossing          0001  0009.png         1
9          0001  crossing          0001  0010.png         1
10         0001  crossing          0002  0001.png         2
11         0001  crossing          0002  0002.png         2
12         0001  crossing          0002  0003.png         2
13         0001  crossing          0002  0004.png         2
14         0001  crossing          0002  0005.png         2
15         0001  crossing          0002  0006.png         2
16         0001  crossing          0002  0007.png         2
17         0001  crossing          0002  0008.png         2
18         0001  crossing          0002  0009.png         2
19         0001  crossing          0002  0010.png         2
20         0002  crossing          0001  0001.png         3
21         0002  crossing          0001  0002.png         3
22         0002  crossing          0001  0003.png         3
23         0002  crossing          0001  0004.png         3
24         0002  crossing          0001  0005.png         3
25         0002  crossing          0001  0006.png         3
26         0002  crossing          0001  0007.png         3
27         0002  crossing          0001  0008.png         3
28         0002  crossing          0001  0009.png         3
29         0002  crossing          0001  0010.png         3

Looks very easy but I am new to pandas and am having a lot of trouble. The only thing I have done is to group them based on seq_name and pedestrian_id before appending a new column. But when I printed the data it no longer looks similar to the one Id like above plus it looks like global_id has been appended to a new row instead of a new column. Would like some help.

df = df.groupby(['seq_name','pedestrian_id']).agg(lambda x: x.tolist())
df["global_id"] = range(1, len(df)+1)

Upvotes: 2

Views: 70

Answers (2)

Space Impact
Space Impact

Reputation: 13255

Use ne with shift and cumsum:

df['global_id'] = df.pedestrian_id.ne(df.pedestrian_id.shift()).cumsum()

print(df)
    seq_name     label  pedestrian_id  frame_no  global_id
0          1  crossing              1  0001.png          1
1          1  crossing              1  0002.png          1
2          1  crossing              1  0003.png          1
3          1  crossing              1  0004.png          1
4          1  crossing              1  0005.png          1
5          1  crossing              1  0006.png          1
6          1  crossing              1  0007.png          1
7          1  crossing              1  0008.png          1
8          1  crossing              1  0009.png          1
9          1  crossing              1  0010.png          1
10         1  crossing              2  0001.png          2
11         1  crossing              2  0002.png          2
12         1  crossing              2  0003.png          2
13         1  crossing              2  0004.png          2
14         1  crossing              2  0005.png          2
15         1  crossing              2  0006.png          2
16         1  crossing              2  0007.png          2
17         1  crossing              2  0008.png          2
18         1  crossing              2  0009.png          2
19         1  crossing              2  0010.png          2
20         2  crossing              1  0001.png          3
21         2  crossing              1  0002.png          3
22         2  crossing              1  0003.png          3
23         2  crossing              1  0004.png          3
24         2  crossing              1  0005.png          3
25         2  crossing              1  0006.png          3
26         2  crossing              1  0007.png          3
27         2  crossing              1  0008.png          3
28         2  crossing              1  0009.png          3
29         2  crossing              1  0010.png          3

Upvotes: 3

jezrael
jezrael

Reputation: 862481

Use GroupBy.ngroup:

df["global_id"] = df.groupby(['seq_name','pedestrian_id']).ngroup() + 1

Another solutions with factorize:

s = df['seq_name'].astype(str) +'_' +df['pedestrian_id'].astype(str)
s = list(map(tuple, df[['seq_name','pedestrian_id']].values.tolist()))
df["global_id"] = pd.factorize(s)[0] + 1

print (df)
    seq_name     label  pedestrian_id  frame_no  global_id
0          1  crossing              1  0001.png          1
1          1  crossing              1  0002.png          1
2          1  crossing              1  0003.png          1
3          1  crossing              1  0004.png          1
4          1  crossing              1  0005.png          1
5          1  crossing              1  0006.png          1
6          1  crossing              1  0007.png          1
7          1  crossing              1  0008.png          1
8          1  crossing              1  0009.png          1
9          1  crossing              1  0010.png          1
10         1  crossing              2  0001.png          2
11         1  crossing              2  0002.png          2
12         1  crossing              2  0003.png          2
13         1  crossing              2  0004.png          2
14         1  crossing              2  0005.png          2
15         1  crossing              2  0006.png          2
16         1  crossing              2  0007.png          2
17         1  crossing              2  0008.png          2
18         1  crossing              2  0009.png          2
19         1  crossing              2  0010.png          2
20         2  crossing              1  0001.png          3
21         2  crossing              1  0002.png          3
22         2  crossing              1  0003.png          3
23         2  crossing              1  0004.png          3
24         2  crossing              1  0005.png          3
25         2  crossing              1  0006.png          3
26         2  crossing              1  0007.png          3
27         2  crossing              1  0008.png          3
28         2  crossing              1  0009.png          3
29         2  crossing              1  0010.png          3

Upvotes: 3

Related Questions