Reputation: 19

How to apply a function by grouping by specific locations in pandas?

logged_at           type    name    values
2020-08-17 00:02:22 weak    AA      55
2020-08-17 00:12:20 weak    AA      54
2020-08-17 00:22:24 weak    AA      53
2020-08-17 00:32:25 weak    AA      50
2020-08-17 00:42:28 strong  AA      44
2020-08-17 00:52:22 strong  AA      33
2020-08-17 01:02:20 strong  AA      32
2020-08-17 01:22:24 weak    AA      56
2020-08-17 01:32:25 weak    AA      55
2020-08-17 01:42:28 weak    AA      43

Above is the dataframe type. What I want to do is to look at each name as a subset until the type is changed and get the value minus the first and last values of the values of each subset.

The final desired shape seems to be this type.

logged_at           type    name    values  rank
2020-08-17 00:02:22 weak    AA      5        1
2020-08-17 00:42:28 strong  AA      12       2
2020-08-17 01:22:24 weak    AA      13       3

Whenever the type is changed, a rank value should be assigned so that each subset can be distinguished, but what I think of now is that a new list is created through itertuples and a type different from the previous type appears while checking the type, raising the rank and storing it in the list, and then a new column. I think of adding it to... I don't know if this is efficient. There are about 300 million rows, so i am looking for the most efficient method.

Upvotes: 2

Answers (2)

sammywemmy

Reputation: 28644

You can break it in stages :

# get the first values :
top = df.loc[df["type"].ne(df["type"].shift())]

# get the last values
bottom = df.loc[df["type"].ne(df["type"].shift(-1))]

#get the difference in values and generate the rank : 
top.assign(values=top["values"].array - bottom["values"].array, 
           rank=range(1, 1 + len(top)))

    logged_at          type   name  values  rank
0   2020-08-17 00:02:22 weak    AA      5   1
4   2020-08-17 00:42:28 strong  AA      12  2
7   2020-08-17 01:22:24 weak    AA      13  3

You can reset the index. However, with 300 million rows, I do not think Pandas is the right choice.

Upvotes: 1

Gilseung Ahn

Reputation: 2614

If the dataframe is sorted by time, just remove duplicates as follows.

df.drop_duplicates(by = 'logged_at', keep = 'first')

Upvotes: 0

How to apply a function by grouping by specific locations in pandas?

Answers (2)

Related Questions