Reputation: 1519
I have a data.table that looks like this
dt <- data.table(ID=c("A","A","B","B"),Amount1=c(100,200,300,400),
Amount2=c(1500,1500,2400,2400),Dupl=c(1,0,1,0))
ID Amount1 Amount2 Dupl
1: A 100 1500 1
2: A 200 1500 0
3: B 300 2400 1
4: B 400 2400 0
I need to duplicate each row that has a 1 in the Dupl column and replace the Amount1 value with the Amount2 value in that duplicated row. Besides that I need to give that duplicated row the value 2 in Dupl. This means it should look like this:
ID Amount1 Amount2 Dupl
1: A 100 1500 1
2: A 1500 1500 2
3: A 200 1500 0
4: B 300 2400 1
5: B 2400 2400 2
6: B 400 2400 0
Any help is much appreciated! Kind regards,
Tim
Upvotes: 17
Views: 20601
Reputation: 56119
Using dplyr
library("dplyr")
rbind(dt,
dt %>%
filter(Dupl == 1) %>%
mutate(Dupl = 2,
Amount1 = Amount2))
# ID Amount1 Amount2 Dupl
# 1: A 100 1500 1
# 2: A 200 1500 0
# 3: B 300 2400 1
# 4: B 400 2400 0
# 5: A 1500 1500 2
# 6: B 2400 2400 2
Upvotes: 15
Reputation: 115
Biased here, but I think this dplyr solution is elegant, and it's also pretty scalable, especially as long as Dupl
is always <= 2. Essentially, it takes advantage of tidyr::uncount
, which says, 'Based on a given column's value (x), repeat each row x times, thereby elongating the df.' Once we've lengthened the df, we can just use dplyr::mutate_at
to replace cells if they're the same value as their lag.
library(tidyverse)
dt %>%
uncount(Dupl + 1) %>%
mutate_at(vars(Amount1),
~case_when(. == lag(.) ~ Amount2, TRUE ~.)) %>%
mutate_at(vars(Dupl),
~case_when(. == lag(.) ~ 2, TRUE ~.))
# ID Amount1 Amount2 Dupl
# 1: A 100 1500 1
# 2: A 1500 1500 2
# 3: A 200 1500 0
# 4: B 300 2400 1
# 5: B 2400 2400 2
# 6: B 400 2400 0
Upvotes: 1
Reputation: 43
Using dplyr's left_join
to do the duplication work. Perhaps not elegant, but should be easy to understand.
library(data.table)
library(dplyr)
joiner <- data.frame(Dupl = 1, helper_col= 1:2)
dt <- left_join(dt, joiner) %>%
mutate(Dupl = ifelse(helper_col == 2 & !is.na(helper_col), 2, Dupl)) %>%
select(-helper_col) %>%
mutate(Amount1 = ifelse(Dupl == 2, Amount2, Amount1))
> dt
ID Amount1 Amount2 Dupl
1 A 100 1500 1
2 A 1500 1500 2
3 A 200 1500 0
4 B 300 2400 1
5 B 2400 2400 2
6 B 400 2400 0
Upvotes: 2
Reputation: 66834
You can rbind
a copy of the sub-setted data with the correct transformations done:
rbind(dt,copy(dt[Dupl==1])[,Amount1:=Amount2][,Dupl:=Dupl+1])
ID Amount1 Amount2 Dupl
1: A 100 1500 1
2: A 200 1500 0
3: B 300 2400 1
4: B 400 2400 0
5: A 1500 1500 2
6: B 2400 2400 2
Alternatively, you can get the duplicates by sub-setting, and then transform the duplicated rows using an intermediate step. This keeps the duplicated row next to the original as in the example in the question:
x <- dt[rep(seq(dt[,Dupl]),times=dt[,Dupl==1]+1)]
x[duplicated(x),c("Amount1","Dupl"):=list(Amount2,Dupl+1)]
x
ID Amount1 Amount2 Dupl
1: A 100 1500 1
2: A 1500 1500 2
3: A 200 1500 0
4: B 300 2400 1
5: B 2400 2400 2
6: B 400 2400 0
Upvotes: 6
Reputation: 887038
You could try
rbind(dt,dt[Dupl==1][,c('Amount1', 'Dupl') := list(Amount2, 2)])
Upvotes: 12
Reputation: 193517
This seems to do what you are asking for. Can probably be refined a bit...
library(splitstackshape)
expandRows(dt, dt$Dupl+1, count.is.col = FALSE)[
Dupl != 0, Dupl := cumsum(Dupl), by = ID][
, Amount1 := ifelse(Dupl > 1, Amount2[-1], Amount1)][]
# ID Amount1 Amount2 Dupl
# 1: A 100 1500 1
# 2: A 1500 1500 2
# 3: A 200 1500 0
# 4: B 300 2400 1
# 5: B 2400 2400 2
# 6: B 400 2400 0
Upvotes: 3