lmiguelvargasf
lmiguelvargasf

Reputation: 70003

How to groupby and apply a function in pandas

I have the following dataframe df:

    period  remaining_et_months  property_id  beds
0        0                    0          329     1
1        1                    0          329     1
2        2                    1          329     1
3        3                    2          329     1
4        3                    2          329     1
5        4                    3          329     1
6        4                    3          329     1
7        4                    3          329     1
8        5                    4          329     1
9        5                    4          329     1
10       5                    4          329     1
11       5                    4          329     1

and I need to group this dataframe and apply a function so that I am able to compute a column called dist_period:

g = df.groupby(['property_id', 'beds', 'period'])
g.apply(some_function)

I need to group by property_id, beds, and period, because the dataframe is larger than the one in this example.

I don't have a clear idea how to achieve this, but what I want the column dist_period to be like this:

    period  remaining_et_months  dist_period
0        0                    0            0
1        1                    0            1 
2        2                    1            1
3        3                    2            1
4        3                    2            2
5        4                    3            1
6        4                    3            2
7        4                    3            3
8        5                    4            1
9        5                    4            2
10       5                    4            3
11       5                    4            4

Notice that when the group has just one element the value of dist_period is period - remaining_et_months, but when the group has more than one element (see when period is 3, 4 or 5), then a count starting at one is perform.

Upvotes: 0

Views: 76

Answers (1)

user3212593
user3212593

Reputation: 496

Get the cumcount for the counting on multiple row groups (edit: note that this starts counting from 0):

df['dist_period'] = df.groupby(['property_id', 'beds', 'period']).cumcount()

Then you just add the difference between period and remaining_et_months:

df['dist_period'] = df['period'] - df['remaining_et_months'] + df['dist_period']

Upvotes: 1

Related Questions