Reputation: 1297
I have a Part A made up of 3 sections. First I describe what I successfully do with this. Then how I am unsuccessful with more Parts.
The Section lengths are in data.table
d1
.
library(data.table)
d1 <- data.table(
Part="A",
Section=1:3,
SecLen=c(10,30,9))
d1
# Part Section SecLen
# 1: A 1 10
# 2: A 2 30
# 3: A 3 9
I also have a set of locations along the Part in d2
.
d2 <- data.table(
Part="A",
PartLoc=c(0,7.5,10,20,35,45,49))
d2
# Part PartLoc
# 1: A 0.0
# 2: A 7.5
# 3: A 10.0
# 4: A 20.0
# 5: A 35.0
# 6: A 45.0
# 7: A 49.0
I want to add which Section each Location is in.
First I stack the Section Lengths in d1
using cumsum
d1[,CumLen:=cumsum(SecLen)]
d1
# Part Section SecLen CumLen
# 1: A 1 10 10
# 2: A 2 30 40
# 3: A 3 9 49
Then I use findInterval
to map the Section. Note that I want Location 10 to be assigned to Section 1, not 2.
d2[,Sec.fI:=findInterval(PartLoc,c(-1,d1$CumLen),left.open=TRUE)]
d2
# Part PartLoc Sec.fI
# 1: A 0.0 1
# 2: A 7.5 1
# 3: A 10.0 1
# 4: A 20.0 2
# 5: A 35.0 2
# 6: A 45.0 3
# 7: A 49.0 3
Another approach uses a data.table
join.
First I add the start locations of each Section.
d1[,CumLen0:=c(-1,head(CumLen,-1))]
d1
# Part Section SecLen CumLen CumLen0
# 1: A 1 10 10 -1
# 2: A 2 30 40 10
# 3: A 3 9 49 40
Then look up the Section.
d2[,Sec.cs:=d1[d2,Section,on=.(CumLen0<PartLoc,CumLen>=PartLoc)]]
d2
# Part PartLoc Sec.fI Sec.cs
# 1: A 0.0 1 1
# 2: A 7.5 1 1
# 3: A 10.0 1 1
# 4: A 20.0 2 2
# 5: A 35.0 2 2
# 6: A 45.0 3 3
# 7: A 49.0 3 3
Either way works.
Now I try with more Parts.
D1 <- data.table(
Part = c("A","A","A","B","B","C"),
Section = c(1,2,3,1,2,1),
SecLen = c(10,30,9,5,20,18) # incorrectly had 10 for the last value
)
D2 <- data.table(
Part = c(rep("A",7),rep("B",3),rep("C",3)),
PartLoc = c(0.0,7.5,10,20,35,45,49,1,12,25,0,9,18)
)
D1[,CumLen:=cumsum(SecLen),by=Part]
D1
# Part Section SecLen CumLen
# 1: A 1 10 10
# 2: A 2 30 40
# 3: A 3 9 49
# 4: B 1 5 5
# 5: B 2 20 25
# 6: C 1 18 18
D2
# Part PartLoc
# 1: A 0.0
# 2: A 7.5
# 3: A 10.0
# 4: A 20.0
# 5: A 35.0
# 6: A 45.0
# 7: A 49.0
# 8: B 1.0
# 9: B 12.0
# 10: B 25.0
# 11: C 0.0
# 12: C 9.0
# 13: C 18.0
I try findInterval
.
D2[,Sec.fI:=findInterval(PartLoc,c(-1,D1$CumLen),left.open=TRUE),by=Part]
# Error in findInterval(PartLoc, c(-1, D1$CumLen), left.open = TRUE) :
# 'vec' must be sorted non-decreasingly and not contain NAs
Obviously it doesn't work because I haven't grouped D1$CumLen
by Part, so it is non-decreasing.
I try the join.
D1[,CumLen0:=c(-1,head(CumLen,-1)),by=Part]
D2[,Sec.cs:=D1[D2,Section,on=.(CumLen0<PartLoc,CumLen>=PartLoc),by=Part]]
# Error in vecseq(f__, len__, if (allow.cartesian || notjoin || !anyDuplicated(f__, :
# Join results in 31 rows; more than 19 = nrow(x)+nrow(i).
# Check for duplicate key values in i each of which join to the same group in x over and over again.
# If that's ok, try by=.EACHI to run j for each group to avoid the large allocation.
# If you are sure you wish to proceed, rerun with allow.cartesian=TRUE.
# Otherwise, please search for this error message in the FAQ, Wiki, Stack Overflow and data.table issue tracker for advice.
Here I am running into the limits of my nascent knowledge of joins. I try allow.cartesian=TRUE
as suggested, but I just confirm that I somehow got 31 rows and don't have what I want.
D1[D2,Section,on=.(CumLen0<PartLoc,CumLen>=PartLoc),by=Part,allow.cartesian=TRUE]
# Part Section
# 1: A 1
# 2: A 1
# 3: A 1
# 4: A 2
# 5: A 2
# 6: A 3
# 7: A 3
# 8: A 1
# 9: A 2
# 10: A 2
# 11: A 1
# 12: A 1
# 13: A 2
# 14: B 1
# 15: B 2
# 16: B 2
# 17: B 2
# 18: B 1
# 19: B 2
# 20: B 2
# 21: B 1
# 22: B 2
# 23: B 2
# 24: C 1
# 25: C 1
# 26: C 1
# 27: C 1
# 28: C 1
# 29: C 1
# 30: C 1
# 31: C 1
# Part Section
I'd appreciate help getting this to work in data.table
.
Edit
I am satisfied with the findInterval
solution by @Ian Campbell. I would still be interested in seeing how to make the join work.
Upvotes: 1
Views: 74
Reputation: 25225
A bit lost in your OP. At least based on the first 3 sentences, here is an option using rolling join:
d1[, LastPage := cumsum(SecLen)]
d2[, Section :=
d1[.SD, on=.(Part, LastPage=PartLoc), roll=-Inf, Section]
]
output:
Part PartLoc Section
1: A 0.0 1
2: A 7.5 1
3: A 10.0 1
4: A 20.0 2
5: A 35.0 2
6: A 45.0 3
7: A 49.0 3
Edit by OP:
I can confirm this works for my case with more Parts.
D2[, Sec.rJ :=
D1[.SD, on=.(Part, CumLen=PartLoc), roll=-Inf, Section]
][]
# Part PartLoc Sec.rJ
# 1: A 0.0 1
# 2: A 7.5 1
# 3: A 10.0 1
# 4: A 20.0 2
# 5: A 35.0 2
# 6: A 45.0 3
# 7: A 49.0 3
# 8: B 1.0 1
# 9: B 12.0 2
# 10: B 25.0 2
# 11: C 0.0 1
# 12: C 9.0 1
# 13: C 18.0 1
Upvotes: 1
Reputation: 24790
You could subset D1
with the .BY
special symbol:
library(data.table)
D2[,Sec.fI:=findInterval(PartLoc,c(-1,D1[Part == .BY,CumLen]),left.open=TRUE),by=Part][]
Part PartLoc Sec.fI
1: A 0.0 1
2: A 7.5 1
3: A 10.0 1
4: A 20.0 2
5: A 35.0 2
6: A 45.0 3
7: A 49.0 3
8: B 1.0 1
9: B 12.0 2
10: B 25.0 2
11: C 0.0 1
12: C 9.0 1
13: C 18.0 2
See help("special-symbols")
for more info.
Upvotes: 1