R - Replace row variables within a data frame with variables from another row

Question

I have a list of data frames similar to the reprex below but with 100+ columns:

# reproducible example
df <- data.frame(
  Name = c("Name1", "Name2", "Name3", "Name4", "Name5"),
  Date = c("2018-01-01", "2018-01-02"),
  Value1 = c(rnorm(5, 2, 3), rnorm(5, 4, 1)),
  Value2 = c(rnorm(5, 12, 4), rnorm(5, 5, 8)),
  Value3 = c(rnorm(5, 22, 13), rnorm(5, 7, 10))
)

# transform data frame into list
df <- split(df, df$Name)

For each data frame in the list, I would like to replace the last row with values from the one prior row. For example, for each data frame in the list, I would like to replace [2, 3:5] with [1, 3:5].

> tail(df[["Name1"]], n = 2)
   Name       Date    Value1    Value2    Value3
1 Name1 2018-01-01 0.9184539 15.658510 29.219707
2 Name1 2018-01-02 3.8875463  3.628546  9.777399

I'm not sure if transforming my data frame into a list is the best way to go about this so any other suggestions are welcome. I tried tackling this as outlined below but my attempt only replaces the last row in the data frame with the second to last row.

My Attempt

# reproducible example
df <- data.frame(
  Name = c("Name1", "Name2", "Name3", "Name4", "Name5"),
  Date = c("2018-01-01", "2018-01-02"),
  Value1 = c(rnorm(5, 2, 3), rnorm(5, 4, 1)),
  Value2 = c(rnorm(5, 12, 4), rnorm(5, 5, 8)),
  Value3 = c(rnorm(5, 22, 13), rnorm(5, 7, 10))
)

# arrange by Name and Date
df <- df %>% dplyr::arrange(Name, Date)

# attempt to replace 
df[length(df$Name), c(3:5)] <- df[length(df$Name)-1, c(3:5)]

# result
tail(df, n = 4)

> tail(df, n = 4)
    Name       Date    Value1    Value2    Value3
7  Name4 2018-01-01  3.242383 -11.44217 -1.215688
8  Name4 2018-01-02 -4.042093  18.18184  1.544271
9  Name5 2018-01-01 -1.930195  13.18662 18.889372
10 Name5 2018-01-02 -1.930195  13.18662 18.889372

www · Accepted Answer

A tidyverse solution. I don't think converting to a list is necessary. df is the data frame in your example. We can replace the last row with NA and then use fill to fill with the previous row.

library(tidyverse)

df2 <- df %>%
  group_by(Name) %>%
  mutate_at(vars(starts_with("Value")), 
            funs(ifelse(row_number() == max(row_number()), NA, .))) %>%
  fill(starts_with("Value")) %>%
  ungroup()
df2
# # A tibble: 10 x 5
#    Name  Date       Value1 Value2 Value3
#                
#  1 Name1 2018-01-01  1.35   14.5   34.2 
#  2 Name1 2018-01-02  1.35   14.5   34.2 
#  3 Name2 2018-01-02  2.42    4.43  19.5 
#  4 Name2 2018-01-01  2.42    4.43  19.5 
#  5 Name3 2018-01-01  4.60   14.1   15.8 
#  6 Name3 2018-01-02  4.60   14.1   15.8 
#  7 Name4 2018-01-02  6.36   11.4    9.40
#  8 Name4 2018-01-01  6.36   11.4    9.40
#  9 Name5 2018-01-01  0.214   8.34  33.8 
# 10 Name5 2018-01-02  0.214   8.34  33.8

The following could be even better. This one does not use the fill function, and it does not change the row order as well.

df2 <- df %>%
  group_by(Name) %>%
  mutate_at(vars(starts_with("Value")), 
            funs(ifelse(row_number() == max(row_number()), 
                        nth(., n = max(row_number()) - 1),
                        .))) %>%
  ungroup()
df2
# # A tibble: 10 x 5
#    Name  Date       Value1 Value2 Value3
#                
#  1 Name1 2018-01-01   4.40  13.5   28.0 
#  2 Name2 2018-01-02   1.82   8.23  20.9 
#  3 Name3 2018-01-01   1.07  16.9    7.50
#  4 Name4 2018-01-02   1.09   8.05  14.4 
#  5 Name5 2018-01-01   1.17  11.6   24.0 
#  6 Name1 2018-01-02   4.40  13.5   28.0 
#  7 Name2 2018-01-01   1.82   8.23  20.9 
#  8 Name3 2018-01-02   1.07  16.9    7.50
#  9 Name4 2018-01-01   1.09   8.05  14.4 
# 10 Name5 2018-01-02   1.17  11.6   24.0

R - Replace row variables within a data frame with variables from another row

Answers (1)

Related Questions