Reputation: 47
I would like to assign a segment-ID to longitude and latitude values that fall within a certain range of minimum/maximum long and lat values that are stored in a different data frame of a different length. My data looks like this: Data frame with observations:
head(obs)
longitude latitude
52.06264 6.412816
52.06097 6.413106
51.06097 6.413346
54.06097 6.413276
51.06089 6.413114
52.05444 6.413094
Data frame with ranges and segment-ID:
head(seg)
segment lon_max lon_min lat_max lat_min
01a 6.857822 6.857476 51.05837 51.03489
01b 6.858979 6.857834 51.03433 50.99901
01c 6.860019 6.858982 51.99836 51.96330
01d 6.860960 6.860050 51.96277 51.92718
01e 6.862294 6.860979 51.92657 51.89125
01f 6.863179 6.862301 51.89059 51.85562
For each point of observation I would like to know within which 'segment' it falls, so I would ideally end up with something like this:
longitude latitude segment
52.03464 6.458816 1a
52.05667 6.416606 1a
51.06097 6.446346 1b
54.03757 6.413276 1c
51.06089 6.422114 1b
52.34243 6.413094 1a
I have tried to do this using just latitude, but I get an error message, due to the different length of the vectors.
obs[['segment']] <- for (i in obs$latitude) {
if (i>=seg$lat_min & i<=seg$lat_max) {
obs$segment=seg$segment
} else {
obs$segment='NA'}
}
Error in `$<-.data.frame`(`*tmp*`, "segment", value = 1:118) :
replacement has 118 rows, data has 10284
I realise why this can't work because it doesn't match row by row, but I don't know how to do this. How can I match each pair of latitude and longitude row by row with the min/max values until I find the range into which it fits and assign the correct segment_ID?
Thanks in advance!
Upvotes: 1
Views: 1546
Reputation: 662
Your longitude and latitude were mixed up across both data frames.
Also, your example data makes it impossible to match your obs and seg since all of your obs of latitude for instance, are less than the look up vals for lat_min.
Despite all that, this should work. You are trying to do a look up table.
#create this so we have validation data
newline <- c( 51.05837, 6.857822)
newobs <- rbind(obs, newline)
library(sqldf)
looked_up<-function(data, lookup){
data<-sqldf("select A.*,B.segment from
data A left join lookup B
ON (A.longitude >= B.lon_min and A.longitude
<= B.lon_max and A.latitude >= B.lat_min and A.latitude
<= B.lat_max) ")
data
}
looked_up(newobs, seg)
#RESULTS
longitude latitude segment
1 52.06264 6.412816 <NA>
2 52.06097 6.413106 <NA>
3 51.06097 6.413346 <NA>
4 54.06097 6.413276 <NA>
5 51.06089 6.413114 <NA>
6 52.05444 6.413094 <NA>
7 51.05837 6.857822 01a
I found this helpful when answering your question.
http://shashiasrblog.blogspot.com/2014/01/excel-style-vlookup-and-rangelookup-in-r.html
Upvotes: 2
Reputation: 2960
Given a vector x
with components "longitude" and "latitude", the function f
uses which.max
to find the appropriate row in seg
. Then apply(obs,1,f)
is the vector of length nrow(obs)
which contains the row numbers of the appropriate segments in seg
:
obs <- read.table( header = TRUE,
text =
"latitude longitude
52.06264 6.412816
51.90089 6.861084
52.06097 6.413106
51.06097 6.413346
54.06097 6.413276
51.04097 6.857576
51.06089 6.413114
51.95089 6.860084
52.05444 6.413094" )
seg <- read.table( header = TRUE,
stringsAsFactors = FALSE,
text =
"segment lon_max lon_min lat_max lat_min
01a 6.857822 6.857476 51.05837 51.03489
01b 6.858979 6.857834 51.03433 50.99901
01c 6.860019 6.858982 51.99836 51.96330
01d 6.860960 6.860050 51.96277 51.92718
01e 6.862294 6.860979 51.92657 51.89125
01f 6.863179 6.862301 51.89059 51.85562")
f <- function(x)
{
which.max( c( ( seg["lon_min"] <= c(x["longitude"]) ) &
( seg["lon_max"] > c(x["longitude"]) ) &
( seg["lat_min"] <= c(x["latitude"]) ) &
( seg["lat_max"] > c(x["latitude"]) ),
TRUE ) )
}
X <- cbind( obs, segment = seg$segment[apply(obs,1,f)] )
Result:
> X
latitude longitude segment
1 52.06264 6.412816 <NA>
2 51.90089 6.861084 01e
3 52.06097 6.413106 <NA>
4 51.06097 6.413346 <NA>
5 54.06097 6.413276 <NA>
6 51.04097 6.857576 01a
7 51.06089 6.413114 <NA>
8 51.95089 6.860084 01d
9 52.05444 6.413094 <NA>
Upvotes: 1