Reputation: 433

How do I output the max value within a range of rows in a data frame?

Suppose I have the following data and data frame:

sample_data <- c(1:14)
sample_data2 <- c(NA,NA,NA, "break", NA, NA, "break", NA,NA,NA,NA,NA,NA,"break")
sample_df <- as.data.frame(sample_data)
sample_df$sample_data2 <- sample_data2

When I print this data frame, the results are as follows:

   sample_data sample_data2
1            1         <NA>
2            2         <NA>
3            3         <NA>
4            4        break
5            5         <NA>
6            6         <NA>
7            7        break
8            8         <NA>
9            9         <NA>
10          10         <NA>
11          11         <NA>
12          12         <NA>
13          13         <NA>
14          14        break

How would I program it so that at every "break", it outputs the max from that row up? For instance, I would want the code to output the set of (4,7,14). Additionally, I would want it so that it only finds the max value between up to the next "break" interval. I apologize in advance if I used any incorrect nomenclature.

Upvotes: 2

Answers (5)

tmfmnk

Reputation: 40171

Depending whether you want to assess the maximum "sample_data" number between all "sample_data2" == break including (e.g. row 1 to row 4) or excluding (e.g. row 1 to row 3) the given "sample_data2" == break row, you can do something like this with tidyverse:

Excluding the break rows:

sample_df %>%
 group_by(sample_data2) %>%
 mutate(temp = ifelse(is.na(sample_data2), NA_character_,  paste0(gl(length(sample_data2), 1)))) %>%
 ungroup() %>%
 fill(temp, .direction = "up") %>%
 filter(is.na(sample_data2)) %>%
 group_by(temp) %>%
 summarise(res = max(sample_data))

  temp    res
  <chr> <dbl>
1 1        3.
2 2        6.
3 3       13.

Including the break rows:

sample_df %>%
 group_by(sample_data2) %>%
 mutate(temp = ifelse(is.na(sample_data2), NA_character_,  paste0(gl(length(sample_data2), 1)))) %>%
 ungroup() %>%
 fill(temp, .direction = "up") %>%
 group_by(temp) %>%
 summarise(res = max(sample_data)) 

  temp    res
  <chr> <dbl>
1 1        4.
2 2        7.
3 3       14.

Both of the codes create an ID variable called "temp" using gl() for "sample_data2" == break and then fill up the NA rows with that ID. Then, the first code filters out the "sample_data2" == break rows and assess the maximum "sample_data" values per group, while the second assess the maximum "sample_data" values per group including the "sample_data2" == break rows.

Upvotes: 1

jsizzle

Reputation: 78

Looks like there are lots of different ways of doing this. This is how I went about it:

rows <- which(sample_data2 == "break") #Get the row indices for where "break" appears

findmax <- function(maxrow) {

  max(sample_data[1:maxrow])

} #Create a function that returns the max "up to" a given row

sapply(rows, findmax) #apply it for each of your rows

### [1]  4  7 14

Note that this works "up to" the given row. To get the maximum value between the two breaks would probably be easier with one of the other solutions, but you could also do it by looking at the j-1 row to jth row from the rows object.

Upvotes: 1

Rui Barradas

Reputation: 76651

Here are 2 ways with base R. The trick is to define a grouping variable, grp.

grp <- !is.na(sample_df$sample_data2) & sample_df$sample_data2 == "break"
grp <- rev(cumsum(rev(grp))) 
grp <- -1*grp + max(grp)

tapply(sample_df$sample_data, grp, max, na.rm = TRUE)
aggregate(sample_data ~ grp, sample_df, max, na.rm = TRUE)

Data.
This is simplified data creation code.

sample_data <- 1:14
sample_data2 <- c(NA,NA,NA, "break", NA, NA, "break", NA,NA,NA,NA,NA,NA,"break")
sample_df <- data.frame(sample_data, sample_data2)

Upvotes: 1

JBGruber

Reputation: 12470

I construct the groups looking for the word "break" and then move the results one row up. Then some dplyr commands to get max of every group.

library(dplyr)
sample_df_new <- sample_df %>% 
  mutate(group = c(1, cumsum(grepl("break", sample_data2)) + 1)[1:length(sample_data2)]) %>% 
  group_by(group) %>% 
  summarise(group_max = max(sample_data))

> sample_df_new
# A tibble: 3 x 2
  group group_max
  <dbl>     <dbl>
1     1         4
2     2         7
3     3        14

Upvotes: 2

denis

Reputation: 5673

I have an answer using data.table:

library(data.table)

sample_df <- setDT(sample_df)
sample_df[,group := (rleid(sample_data2)-0.5)%/%2]
sample_df[,.(maxvalues = max(sample_data)),by = group]

   group maxvalues
1:     0         4
2:     1         7
3:     2        14

The tricky part is (rleid(sample_data2)-0.5)%/%2: rleid create an increasing index to each change :

    sample_data sample_data2 rleid
 1:           1           NA     1
 2:           2           NA     1
 3:           3           NA     1
 4:           4        break     2
 5:           5           NA     3
 6:           6           NA     3
 7:           7        break     4
 8:           8           NA     5
 9:           9           NA     5
10:          10           NA     5
11:          11           NA     5
12:          12           NA     5
13:          13           NA     5
14:          14        break     6

If you keep the entire part of that index - 0.5, you have a constant index for the rows you want, that you can use for grouping operation:

    sample_data sample_data2 group
 1:           1           NA     0
 2:           2           NA     0
 3:           3           NA     0
 4:           4        break     0
 5:           5           NA     1
 6:           6           NA     1
 7:           7        break     1
 8:           8           NA     2
 9:           9           NA     2
10:          10           NA     2
11:          11           NA     2
12:          12           NA     2
13:          13           NA     2
14:          14        break     2

Then it is just taking the maximum for each group. You can easily translate it into dplyr if it is easier for you

Upvotes: 2

How do I output the max value within a range of rows in a data frame?

Answers (5)

Related Questions