Reputation: 43
I have a dataset that is similar to the following:
df <- data.frame(
date = c("2020-02-01", "2020-02-02", "2020-02-03", "2020-02-04", "2020-02-05", "2020-02-06"),
value = c(0,1,2,7,3,4))
I would like to split my data frame into two smaller data frames such that the first data frame includes a part of the original data frame before the value reaches its max (i.e. 7) and the second part of the data frame includes the rest of the original data frame as follows:
df1 <- data.frame(
date = c("2020-02-01", "2020-02-02", "2020-02-03"),
value = c(0,1,2)
)
df2 <- data.frame(
date = c("2020-02-04", "2020-02-05", "2020-02-06"),
value = c(7, 3, 4)
)
*** The 2nd part of the question Now assume that I have the following dataset including more than one object identified by IDs. So, I would like to the same thing as explained above and applied to all objects (IDs)
df <- data.frame( ID = c(1,1,1,1,1,1,2,2,2,2),
date = c("2020-02-01", "2020-02-02", "2020-02-03", "2020-02-04", "2020-02-05", "2020-02-06", "2020-02-01", "2020-02-02","2020-02-03", "2020-02-04"),
value = c(0,1,2,7,3,4,10,16,11,12))
Thanks for your time.
Upvotes: 1
Views: 147
Reputation: 388982
You can use which.max
to get the index of max
value and use it to subset the dataframe.
ind <- which.max(df$value)
df1 <- df[seq_len(ind - 1), ]
df2 <- df[ind:nrow(df), ]
df1
# A tibble: 3 x 2
# date value
# <chr> <dbl>
#1 2020-02-01 0
#2 2020-02-02 1
#3 2020-02-03 2
df2
# A tibble: 3 x 2
# date value
# <chr> <dbl>
#1 2020-02-04 7
#2 2020-02-05 3
#3 2020-02-06 4
We could create a list of dataframes if there are lot of ID
's and we have to do this for each ID
.
result <- df %>%
group_split(ID) %>%
purrr::map(~{.x %>%
group_split(row_number() < which.max(value), .keep = FALSE)})
## In case, someone is interested you could make a data frame from the list above as follows:
result_df <- result %>%
bind_rows()
Upvotes: 1
Reputation: 11584
Another approach using base R:
> df
date value
1 2020-02-01 0
2 2020-02-02 1
3 2020-02-03 2
4 2020-02-04 7
5 2020-02-05 3
6 2020-02-06 4
> df1 <- df[1:(which(df$value == max(df$value)) - 1), ]
> df2 <- df[which(df$value == max(df$value)):nrow(df), ]
> df1
date value
1 2020-02-01 0
2 2020-02-02 1
3 2020-02-03 2
> df2
date value
4 2020-02-04 7
5 2020-02-05 3
6 2020-02-06 4
>
For the grouped data:
> mylist <- df %>% split(f = df$ID)
> mylist
$`1`
ID date value
1 1 2020-02-01 0
2 1 2020-02-02 1
3 1 2020-02-03 2
4 1 2020-02-04 7
5 1 2020-02-05 3
6 1 2020-02-06 4
$`2`
ID date value
7 2 2020-02-01 10
8 2 2020-02-02 16
9 2 2020-02-03 11
10 2 2020-02-04 12
> split_list <- lapply(mylist, function(x) x[1:(which.max(x$value) - 1),])
> split_list <- append(split_list, lapply(mylist, function(x) x[which.max(x$value): nrow(x),]))
> split_list
$`1`
ID date value
1 1 2020-02-01 0
2 1 2020-02-02 1
3 1 2020-02-03 2
$`2`
ID date value
7 2 2020-02-01 10
$`1`
ID date value
4 1 2020-02-04 7
5 1 2020-02-05 3
6 1 2020-02-06 4
$`2`
ID date value
8 2 2020-02-02 16
9 2 2020-02-03 11
10 2 2020-02-04 12
>
Upvotes: 1