split a data frame using after a date where the value of another variable reaches to max/min on that date

Question

I have a dataset that is similar to the following:

df <- data.frame( 
                date = c("2020-02-01", "2020-02-02", "2020-02-03", "2020-02-04", "2020-02-05", "2020-02-06"),
               value = c(0,1,2,7,3,4))

I would like to split my data frame into two smaller data frames such that the first data frame includes a part of the original data frame before the value reaches its max (i.e. 7) and the second part of the data frame includes the rest of the original data frame as follows:

df1 <- data.frame(
                 date = c("2020-02-01", "2020-02-02", "2020-02-03"),
                 value = c(0,1,2)
                 )
df2 <- data.frame(
                 date = c("2020-02-04", "2020-02-05", "2020-02-06"),
                 value = c(7, 3, 4)
                 )

*** The 2nd part of the question Now assume that I have the following dataset including more than one object identified by IDs. So, I would like to the same thing as explained above and applied to all objects (IDs)

df <- data.frame( ID = c(1,1,1,1,1,1,2,2,2,2),
                date = c("2020-02-01", "2020-02-02", "2020-02-03", "2020-02-04", "2020-02-05", "2020-02-06", "2020-02-01", "2020-02-02","2020-02-03", "2020-02-04"),
               value = c(0,1,2,7,3,4,10,16,11,12))

Thanks for your time.

Ronak Shah · Accepted Answer

You can use which.max to get the index of max value and use it to subset the dataframe.

ind <- which.max(df$value)
df1 <- df[seq_len(ind - 1), ]
df2 <- df[ind:nrow(df), ]

df1
# A tibble: 3 x 2
#  date       value
#        
#1 2020-02-01     0
#2 2020-02-02     1
#3 2020-02-03     2

df2
# A tibble: 3 x 2
#  date       value
#        
#1 2020-02-04     7
#2 2020-02-05     3
#3 2020-02-06     4

We could create a list of dataframes if there are lot of ID's and we have to do this for each ID.

result <- df %>%
            group_split(ID) %>%
            purrr::map(~{.x %>% 
               group_split(row_number() < which.max(value), .keep = FALSE)})

## In case, someone is interested you could make a data frame from the list above as follows: 
result_df <- result %>%
bind_rows()

split a data frame using after a date where the value of another variable reaches to max/min on that date

Answers (2)

Related Questions