Replace duplicated values in consecutive runs with blank

Question

First, some data:

library(data.table)

# 1. Input table
df_input <- data.table(
  x = c("x1", "x1", "x1", "x2", "x2"),
  y = c("y1", "y1", "y2", "y1", "y1"),
  z = c(1:5))

In each column, I want to keep only the first value in each run of consecutive values. E.g. look at the y column, which has three different runs: (1) two y1, (2) one y2, and (3) a second run of y1. Within each such run, duplicated values should be replaced with "".

#     x  y z
# 1: x1 y1 1   # 1st value in run of y1: keep
# 2: x1 y1 2   # 2nd value in run: replace
# 3: x1 y2 3   # 1st value in run: keep
# 4: x2 y1 4   # 1st value in 2nd run of y1: keep
# 5: x2 y1 5   # 2nd value: replace

Thus, the desired output table:

df_output <- data.table(
  x = c("x1", "", "",  "x2", ""),
  y = c("y1", "", "y2", "y1", ""),
  z = c(1:5))

#     x  y z
# 1: x1 y1 1
# 2:       2
# 3:    y2 3
# 4: x2 y1 4
# 5:       5

How it's possible to get "output" table by using dplyr or data.table packages?

Thanks

Ronak Shah · Accepted Answer

We can use rleid with duplicated to replace consecutive repeating values with empty value ('').

library(data.table)
df_input[, lapply(.SD, function(x) replace(x, duplicated(rleid(x)), ''))]


#    x  y z
#1: x1 y1 1
#2:       2
#3:    y2 3
#4: x2 y1 4
#5:       5

Using it in dplyr :

library(dplyr)
df_input %>% mutate_all(~replace(., duplicated(rleid(.)), ''))

Replace duplicated values in consecutive runs with blank

Answers (2)

Related Questions