H Hosseini
H Hosseini

Reputation: 43

split a data frame using after a date where the value of another variable reaches to max/min on that date

I have a dataset that is similar to the following:

df <- data.frame( 
                date = c("2020-02-01", "2020-02-02", "2020-02-03", "2020-02-04", "2020-02-05", "2020-02-06"),
               value = c(0,1,2,7,3,4))

I would like to split my data frame into two smaller data frames such that the first data frame includes a part of the original data frame before the value reaches its max (i.e. 7) and the second part of the data frame includes the rest of the original data frame as follows:

df1 <- data.frame(
                 date = c("2020-02-01", "2020-02-02", "2020-02-03"),
                 value = c(0,1,2)
                 )
df2 <- data.frame(
                 date = c("2020-02-04", "2020-02-05", "2020-02-06"),
                 value = c(7, 3, 4)
                 )

*** The 2nd part of the question Now assume that I have the following dataset including more than one object identified by IDs. So, I would like to the same thing as explained above and applied to all objects (IDs)

df <- data.frame( ID = c(1,1,1,1,1,1,2,2,2,2),
                date = c("2020-02-01", "2020-02-02", "2020-02-03", "2020-02-04", "2020-02-05", "2020-02-06", "2020-02-01", "2020-02-02","2020-02-03", "2020-02-04"),
               value = c(0,1,2,7,3,4,10,16,11,12))

Thanks for your time.

Upvotes: 1

Views: 147

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 388982

You can use which.max to get the index of max value and use it to subset the dataframe.

ind <- which.max(df$value)
df1 <- df[seq_len(ind - 1), ]
df2 <- df[ind:nrow(df), ]

df1
# A tibble: 3 x 2
#  date       value
#  <chr>      <dbl>
#1 2020-02-01     0
#2 2020-02-02     1
#3 2020-02-03     2

df2
# A tibble: 3 x 2
#  date       value
#  <chr>      <dbl>
#1 2020-02-04     7
#2 2020-02-05     3
#3 2020-02-06     4

We could create a list of dataframes if there are lot of ID's and we have to do this for each ID.

result <- df %>%
            group_split(ID) %>%
            purrr::map(~{.x %>% 
               group_split(row_number() < which.max(value), .keep = FALSE)})

## In case, someone is interested you could make a data frame from the list above as follows: 
result_df <- result %>%
bind_rows()


Upvotes: 1

Karthik S
Karthik S

Reputation: 11584

Another approach using base R:

> df
        date value
1 2020-02-01     0
2 2020-02-02     1
3 2020-02-03     2
4 2020-02-04     7
5 2020-02-05     3
6 2020-02-06     4
> df1 <-  df[1:(which(df$value == max(df$value)) - 1), ]
> df2 <- df[which(df$value == max(df$value)):nrow(df), ]
> df1
        date value
1 2020-02-01     0
2 2020-02-02     1
3 2020-02-03     2
> df2
        date value
4 2020-02-04     7
5 2020-02-05     3
6 2020-02-06     4
> 

For the grouped data:

> mylist <- df %>% split(f = df$ID)
> mylist
$`1`
  ID       date value
1  1 2020-02-01     0
2  1 2020-02-02     1
3  1 2020-02-03     2
4  1 2020-02-04     7
5  1 2020-02-05     3
6  1 2020-02-06     4

$`2`
   ID       date value
7   2 2020-02-01    10
8   2 2020-02-02    16
9   2 2020-02-03    11
10  2 2020-02-04    12

> split_list <- lapply(mylist, function(x) x[1:(which.max(x$value) - 1),])
> split_list <- append(split_list, lapply(mylist, function(x) x[which.max(x$value): nrow(x),]))
> split_list
$`1`
  ID       date value
1  1 2020-02-01     0
2  1 2020-02-02     1
3  1 2020-02-03     2

$`2`
  ID       date value
7  2 2020-02-01    10

$`1`
  ID       date value
4  1 2020-02-04     7
5  1 2020-02-05     3
6  1 2020-02-06     4

$`2`
   ID       date value
8   2 2020-02-02    16
9   2 2020-02-03    11
10  2 2020-02-04    12

> 

Upvotes: 1

Related Questions