Mohammad.sh
Mohammad.sh

Reputation: 255

Add a column based on frequency for each group

I have a dataframe like this:

        vehicle_id   trip
0            0        0
1            0        0
2            0        0
3            0        1
4            0        1
5            1        0
6            1        0
7            1        1
8            1        1
9            1        1
10           1        1
11           1        1
12           1        2
13           2        0
14           2        1
15           2        2

I want to add a column that counts the frequency of each trip value for each 'vehicle id' group and drop the rows where the frequency is equal to 'one'. So after adding the column the frequency will be like this:

        vehicle_id   trip  frequency
0            0        0      3
1            0        0      3
2            0        0      3
3            0        1      2
4            0        1      2
5            1        0      2
6            1        0      2
7            1        1      5
8            1        1      5
9            1        1      5
10           1        1      5
11           1        1      5
12           1        2      1
13           2        0      1
14           2        1      1
15           2        2      1

and the final result will be like this

        vehicle_id   trip  frequency
0            0        0      3
1            0        0      3
2            0        0      3
3            0        1      2
4            0        1      2
5            1        0      2
6            1        0      2
7            1        1      5
8            1        1      5
9            1        1      5
10           1        1      5
11           1        1      5

what is the best solution for that? Also, what should I do if I intend to directly drop rows where the frequency is equal to 1 in each group (without adding the frequency column)?

Upvotes: 2

Views: 283

Answers (2)

L4ur3nt
L4ur3nt

Reputation: 118

Check the collab here : https://colab.research.google.com/drive/1AuBTuW7vWj1FbJzhPuE-QoLncoF5W_7W?usp=sharing

You can use df.groupby() :

df["frequency"] = df.groupby(["vehicle_id","trip"]).transform("count")

But of course you need to create the frequency column before_hand :

df["frequency"] = 0

If I take your dataframe as example this gives :

import pandas as pd


dict = {"vehicle_id" : [0,0,0,0,0,1,1,1,1,1,1,1],
        "trip" : [0,0,0,1,1,0,0,1,1,1,1,1]}

df = pd.DataFrame.from_dict(dict)

df["frequency"] = 0

df["frequency"] = df.groupby(["vehicle_id","trip"]).transform("count")

output :

Upvotes: 1

Andrej Kesely
Andrej Kesely

Reputation: 195553

Try:

df["frequency"] = (
    df.assign(frequency=0).groupby(["vehicle_id", "trip"]).transform("count")
)
print(df[df.frequency > 1])

Prints:

    vehicle_id  trip  frequency
0            0     0          3
1            0     0          3
2            0     0          3
3            0     1          2
4            0     1          2
5            1     0          2
6            1     0          2
7            1     1          5
8            1     1          5
9            1     1          5
10           1     1          5
11           1     1          5

Upvotes: 0

Related Questions