Andrew Becker
Andrew Becker

Reputation: 9

Extract rolling maximum value based on column value

I have some data that I've performed cluster analysis on and need to find breakpoints based on population density. The clusters overlap heavily, so I've sorted the data by population density and want to extract the last value before the 'cluster' column switches to another cluster. Basically the data looks like this:

cluster  PopDens
1        5
1        7
2        8
2        9
1        10
1        12
3        14
1        16

And I would want it to return the following:

Cluster  PopDens
1        7
2        9
1        12
3        14
1        16

How would I go about achieving this in R?

Upvotes: 0

Views: 98

Answers (3)

talat
talat

Reputation: 70336

In base R it could be done using:

x[cumsum(rle(x$cluster)$lengths),]
#  cluster PopDens
#2       1       7
#4       2       9
#6       1      12
#7       3      14
#8       1      16

This also translates quite directly to data.table in case you are interested:

library(data.table)
setDT(x)[cumsum(rle(cluster)$lengths)]

And of course we can also do it in dplyr:

library(dplyr)
slice(x, cumsum(rle(cluster)$len))

Upvotes: 3

Jaap
Jaap

Reputation: 83275

Another data.table solution:

library(data.table)
setDT(df)[df[, tail(.I,1), rleid(cluster)]$V1]

which gives:

   cluster PopDens
1:       1       7
2:       2       9
3:       1      12
4:       3      14
5:       1      16

Upvotes: 0

Uwe
Uwe

Reputation: 42602

With data.table the rleid()function can by used for grouping:

library(data.table)
setDT(DF)[, .(PopDens = last(PopDens)), .(rleid(cluster), cluster)][, rleid := NULL][]
#   cluster PopDens
#1:       1       7
#2:       2       9
#3:       1      12
#4:       3      14
#5:       1      16

There are alternative ways to achieve the same result:

DF[, .(PopDens = PopDens[.N]), .(rleid(cluster), cluster)][, rleid := NULL][]
DF[, .(PopDens = tail(PopDens, 1), .(rleid(cluster), cluster)][, rleid := NULL][]
DF[, .SD[.N], .(rleid(cluster), cluster)][, rleid := NULL][]
DF[, tail(.SD, 1), .(rleid(cluster), cluster)][, rleid := NULL][]

Upvotes: 0

Related Questions