Reputation: 3043
First, some data:
library(data.table)
# 1. Input table
df_input <- data.table(
x = c("x1", "x1", "x1", "x2", "x2"),
y = c("y1", "y1", "y2", "y1", "y1"),
z = c(1:5))
In each column, I want to keep only the first value in each run of consecutive values. E.g. look at the y
column, which has three different runs: (1) two y1
, (2) one y2
, and (3) a second run of y1
. Within each such run, duplicated values should be replaced with ""
.
# x y z
# 1: x1 y1 1 # 1st value in run of y1: keep
# 2: x1 y1 2 # 2nd value in run: replace
# 3: x1 y2 3 # 1st value in run: keep
# 4: x2 y1 4 # 1st value in 2nd run of y1: keep
# 5: x2 y1 5 # 2nd value: replace
Thus, the desired output table:
df_output <- data.table(
x = c("x1", "", "", "x2", ""),
y = c("y1", "", "y2", "y1", ""),
z = c(1:5))
# x y z
# 1: x1 y1 1
# 2: 2
# 3: y2 3
# 4: x2 y1 4
# 5: 5
How it's possible to get "output" table by using dplyr or data.table packages?
Thanks
Upvotes: 1
Views: 129
Reputation: 887148
We can use set
with data.table
library(data.table)
for(j in names(df_input))
set(df_input, i = which(duplicated(rleid(df_input[[j]]))), j = j, value = '')
df_input
# x y z
#1: x1 y1 1
#2: 2
#3: y2 3
#4: x2 y1 4
#5: 5
Upvotes: 2
Reputation: 388982
We can use rleid
with duplicated
to replace
consecutive repeating values with empty value (''
).
library(data.table)
df_input[, lapply(.SD, function(x) replace(x, duplicated(rleid(x)), ''))]
# x y z
#1: x1 y1 1
#2: 2
#3: y2 3
#4: x2 y1 4
#5: 5
Using it in dplyr
:
library(dplyr)
df_input %>% mutate_all(~replace(., duplicated(rleid(.)), ''))
Upvotes: 2