Reputation: 509
I have a list of data frames similar to the reprex below but with 100+ columns:
# reproducible example
df <- data.frame(
Name = c("Name1", "Name2", "Name3", "Name4", "Name5"),
Date = c("2018-01-01", "2018-01-02"),
Value1 = c(rnorm(5, 2, 3), rnorm(5, 4, 1)),
Value2 = c(rnorm(5, 12, 4), rnorm(5, 5, 8)),
Value3 = c(rnorm(5, 22, 13), rnorm(5, 7, 10))
)
# transform data frame into list
df <- split(df, df$Name)
For each data frame in the list, I would like to replace the last row with values from the one prior row. For example, for each data frame in the list, I would like to replace [2, 3:5]
with [1, 3:5]
.
> tail(df[["Name1"]], n = 2)
Name Date Value1 Value2 Value3
1 Name1 2018-01-01 0.9184539 15.658510 29.219707
2 Name1 2018-01-02 3.8875463 3.628546 9.777399
I'm not sure if transforming my data frame into a list is the best way to go about this so any other suggestions are welcome. I tried tackling this as outlined below but my attempt only replaces the last row in the data frame with the second to last row.
My Attempt
# reproducible example
df <- data.frame(
Name = c("Name1", "Name2", "Name3", "Name4", "Name5"),
Date = c("2018-01-01", "2018-01-02"),
Value1 = c(rnorm(5, 2, 3), rnorm(5, 4, 1)),
Value2 = c(rnorm(5, 12, 4), rnorm(5, 5, 8)),
Value3 = c(rnorm(5, 22, 13), rnorm(5, 7, 10))
)
# arrange by Name and Date
df <- df %>% dplyr::arrange(Name, Date)
# attempt to replace
df[length(df$Name), c(3:5)] <- df[length(df$Name)-1, c(3:5)]
# result
tail(df, n = 4)
> tail(df, n = 4)
Name Date Value1 Value2 Value3
7 Name4 2018-01-01 3.242383 -11.44217 -1.215688
8 Name4 2018-01-02 -4.042093 18.18184 1.544271
9 Name5 2018-01-01 -1.930195 13.18662 18.889372
10 Name5 2018-01-02 -1.930195 13.18662 18.889372
Upvotes: 1
Views: 920
Reputation: 39174
A tidyverse
solution. I don't think converting to a list is necessary. df
is the data frame in your example. We can replace the last row with NA
and then use fill
to fill with the previous row.
library(tidyverse)
df2 <- df %>%
group_by(Name) %>%
mutate_at(vars(starts_with("Value")),
funs(ifelse(row_number() == max(row_number()), NA, .))) %>%
fill(starts_with("Value")) %>%
ungroup()
df2
# # A tibble: 10 x 5
# Name Date Value1 Value2 Value3
# <fct> <fct> <dbl> <dbl> <dbl>
# 1 Name1 2018-01-01 1.35 14.5 34.2
# 2 Name1 2018-01-02 1.35 14.5 34.2
# 3 Name2 2018-01-02 2.42 4.43 19.5
# 4 Name2 2018-01-01 2.42 4.43 19.5
# 5 Name3 2018-01-01 4.60 14.1 15.8
# 6 Name3 2018-01-02 4.60 14.1 15.8
# 7 Name4 2018-01-02 6.36 11.4 9.40
# 8 Name4 2018-01-01 6.36 11.4 9.40
# 9 Name5 2018-01-01 0.214 8.34 33.8
# 10 Name5 2018-01-02 0.214 8.34 33.8
The following could be even better. This one does not use the fill
function, and it does not change the row order as well.
df2 <- df %>%
group_by(Name) %>%
mutate_at(vars(starts_with("Value")),
funs(ifelse(row_number() == max(row_number()),
nth(., n = max(row_number()) - 1),
.))) %>%
ungroup()
df2
# # A tibble: 10 x 5
# Name Date Value1 Value2 Value3
# <fct> <fct> <dbl> <dbl> <dbl>
# 1 Name1 2018-01-01 4.40 13.5 28.0
# 2 Name2 2018-01-02 1.82 8.23 20.9
# 3 Name3 2018-01-01 1.07 16.9 7.50
# 4 Name4 2018-01-02 1.09 8.05 14.4
# 5 Name5 2018-01-01 1.17 11.6 24.0
# 6 Name1 2018-01-02 4.40 13.5 28.0
# 7 Name2 2018-01-01 1.82 8.23 20.9
# 8 Name3 2018-01-02 1.07 16.9 7.50
# 9 Name4 2018-01-01 1.09 8.05 14.4
# 10 Name5 2018-01-02 1.17 11.6 24.0
Upvotes: 1