Reputation: 2346
I have tables like:
x = data.table(Id=c(1,1,2,3,3,4), Name=c("A", "A", "B", "C", "C", "D"), TxId=c(10, 11, 20, 30, 31, 40))
#Id Name TxId
#1: 1 A 10
#2: 1 A 11
#3: 2 B 20
#4: 3 C 30
#5: 3 C 31
#6: 4 D 40
y = data.table(Name=c("A", "B", "B", "C"), Family=c("A-alpha", "B-beta", "B-gamma", "C-delta"))
# Name Family
#1: A A-alpha
#2: B B-beta
#3: B B-gamma
#4: C C-delta
I can do the left-join and concatenation, but I only want one output row for each row in X.
# Left join X to Y on Name column
xy = y[x, on="Name"]
# Name Family Id TxId
#1: A A-alpha 1 10
#2: A A-alpha 1 11
#3: B B-beta 2 20
#4: B B-gamma 2 20
#5: C C-delta 3 30
#6: C C-delta 3 31
#7: D NA 4 40
# Concatenate Family column
xy[, Family:=paste0(Family, collapse=", "), by=c("Name", "TxId")]
# Name Family Id TxId
#1: A A-alpha 1 10
#2: A A-alpha 1 11
#3: B B-beta, B-gamma 2 20
#4: B B-beta, B-gamma 2 20
#5: C C-delta 3 30
#6: C C-delta 3 31
#7: D NA 4 40
How do I get rid of the extra row for B? I want it to be unique on Id/TxId. i.e.
# Name Family Id TxId
#1: A A-alpha 1 10
#2: A A-alpha 1 11
#3: B B-beta, B-gamma 2 20
#5: C C-delta 3 30
#6: C C-delta 3 31
#7: D NA 4 40
If I do as eddi comments:
xy[, .(Family=paste0(Family, collapse=", "), by=c("Name", "TxId")])
I get the correct result. But if I try to add other columns, it doesn't work (I get the same result as if I had done the :=
version):
xy[, .(Id, Family=paste0(Family, collapse=", ")), by=c("Name", "TxId")]
Upvotes: 0
Views: 274
Reputation: 42592
Please, try
xy[, .(Family = paste0(Family, collapse = ", "), by = c("Id", "Name", "TxId")]
I'll try to explain:
If Id
is part of the group-by it will only appear once for each unique value of Id
(to be exact, for each unique combination of Id
, Name
, TxId
). If Id
is included in the j
-expression, i.e., .(Id, Family = paste0(Family, collapse = ", ")
then every row of Id
will be included in the result set despite Family
is being aggregated.
Upvotes: 1