Reputation: 481
I want to perform an operation on a subset of rows in a data.table
that result in a greater number of rows than what I started out with. Is there an easy way to expand the original data.table
to accommodate this? If not, how could I accomplish this?
Here's a sample of my original data.
DT <- data.table(my.id=c(1,2,3), unmodified=c("a","b","c"), vals=c("apple",NA,"cat"))
DT
my.id unmodified vals
1: 1 a apple
2: 2 b NA
3: 3 c cat
And this is my desired output.
DT
my.id unmodified vals
1: 1 a apple
2: 2 b boy
3: 2 b bat
4: 2 b bag
5: 3 c cat
The new rows can appear at the end as well, I don't care about the order. I tried DT[my.id == 2, vals := c("boy","bat","bag")]
, but it ignores the last 2 entries with a warning.
TIA!
EDIT: My original dataset has about 10 million rows, although the entry with a missing value occurs just once. I'd prefer not to create copies of the data.table
, if possible.
Upvotes: 0
Views: 40
Reputation: 3492
> DT <- data.table(my.id=c(1,2,3), unmodified=c("a","b","c"), vals=c("apple",NA,"cat"))
> DT
my.id unmodified vals
1: 1 a apple
2: 2 b NA
3: 3 c cat
> DT2 <- data.table(my.id=rep(2,3), unmodified=rep("b",3), vals=c("boy","bat","bag"))
> DT2
my.id unmodified vals
1: 2 b boy
2: 2 b bat
3: 2 b bag
> rbind(DT,DT2)
my.id unmodified vals
1: 1 a apple
2: 2 b NA
3: 3 c cat
4: 2 b boy
5: 2 b bat
6: 2 b bag
> rbind(DT,DT2)[order(my.id),]
my.id unmodified vals
1: 1 a apple
2: 2 b NA
3: 2 b boy
4: 2 b bat
5: 2 b bag
6: 3 c cat
> na.omit(rbind(DT,DT2)[order(my.id),])
my.id unmodified vals
1: 1 a apple
2: 2 b boy
3: 2 b bat
4: 2 b bag
5: 3 c cat
Upvotes: 0
Reputation: 886938
Another option is to subset the datasets that have 'my.id' as 2 and not 2, then rbind
rbind(DT[my.id == 2][, .(my.id, unmodified, vals = c('boy', 'bat',
'bag'))], DT[my.id != 2])[order(my.id)]
# my.id unmodified vals
#1: 1 a apple
#2: 2 b boy
#3: 2 b bat
#4: 2 b bag
#5: 3 c cat
Upvotes: 0
Reputation: 214927
You can use the summarize
pattern of data.table by setting the group variables to be my.id
and unmodified
here; this broadcasts values within each group if the length doesn't match:
DT[, .(vals = if(my.id == 2) c("boy","bat","bag") else vals), .(my.id, unmodified)]
# my.id unmodified vals
#1: 1 a apple
#2: 2 b boy
#3: 2 b bat
#4: 2 b bag
#5: 3 c cat
Upvotes: 1