Reputation: 313
I have to merge two datasets (spatial) for which I need to keep the row (polyline) with the most information (i.e. the longest line).
I can select the rows with the same ID as the other dataframe or not select rows with the same ID (see sample below). And reverse that operation. But I can't figure out how to choose the row of the two dataframes that has the bigger length value.
#set up sample data
sample = data.frame(x=c(1:5), length=c(1.2,1.3,1.5,7.2,36.1), ID_obj=c("a3", "4a", "5b", "8b", "a7"))
sample2 = data.frame(x=c(1:5), length=c(15.1,1.3,1.5,17.2,6.1), ID_obj=c("a3", "k6", "9c", "8b", "a7"))
#select the rows with identical values
sample[sample$ID_obj %in% sample2$ID_obj,]
sample2[sample2$ID_obj %in% sample$ID_obj,]
#select rows without duplicates in ID
'%!in%' <- function(x,y)!('%in%'(x,y))
sample[sample$ID_obj %!in% sample2$ID_obj,]
sample2[sample2$ID_obj %!in% sample$ID_obj,]
error<-rbind(sample, sample2[sample2$ID_obj %!in% sample$ID_obj,])
# x length ID_obj
#1 1 1.2 a3
#2 2 1.3 4a
#3 3 1.5 5b
#4 4 7.2 8b#keep 8b from the first set should not have been kept because length is shorter
#5 5 36.1 a7
#21 2 1.3 k6
#31 3 1.5 9c
#this is the result I want to get automatically
final<-rbind(sample[c(2, 3, 5),], sample2[c(1, 2, 3, 4),])#
# x length ID_obj
#2 2 1.3 4a
#3 3 1.5 5b
#5 5 36.1 a7#keep a7 from the first set because length is longer
#1 1 15.1 a3
#21 2 1.3 k6
#31 3 1.5 9c
#4 4 17.2 8b#keep 8b from the second set because length is longer
Upvotes: 0
Views: 183
Reputation: 11066
A bit more more cryptic with base functions, but just as an exercise:
x <- rbind(sample, sample2)
x <- x[order(x$length), ]
x <- do.call(rbind, lapply(split(x, x$ID_obj), tail, n=1))
x
# x length ID_obj
# 4a 2 1.3 4a
# 5b 3 1.5 5b
# 8b 4 17.2 8b
# a3 1 15.1 a3
# a7 5 36.1 a7
# 9c 3 1.5 9c
# k6 2 1.3 k6
Add rownames(x) <- NULL
if you don't want to use ID_obj as row names.
Upvotes: 1
Reputation: 8760
Use the data.table
package for a simplified syntax (and better performance than data.frame
):
sample = data.frame(x=c(1:5), length=c(1.2,1.3,1.5,7.2,36.1), ID_obj=c("a3", "4a", "5b", "8b", "a7"))
sample2 = data.frame(x=c(1:5), length=c(15.1,1.3,1.5,17.2,6.1), ID_obj=c("a3", "k6", "9c", "8b", "a7"))
library(data.table)
setDT(sample) # convert data.frame to data.table "in-place"
setDT(sample2)
x <- rbind(sample, sample2) # combine rows vertically
setorder(x, -length) # order by length descending
x[, head(.SD, 1), by = ID_obj] # output the first row ("head") per ID_obj group
To get the result (in a different order than your expected result):
ID_obj x length
1: a7 5 36.1
2: 8b 4 17.2
3: a3 1 15.1
4: 5b 3 1.5
5: 9c 3 1.5
6: 4a 2 1.3
7: k6 2 1.3
Upvotes: 1