Reputation: 41
I have two data sets with latitude, longitude, and temperature data. One data set corresponds to a geographic region of interest with the corresponding lat/long pairs that form the boundary and contents of the region (Matrix Dimension = 4518x2)
The other data set contains lat/long and temperature data for a larger region that envelopes the region of interest (Matrix Dimenion = 10875x3).
My question is: How do you extract the appropriate row data (lat, long, temperature) from the 2nd data set that matches the first data set's lat/long data?
I've tried a variety of "for loops," "subset," and "unique" commands but I can't obtain the matching temperature data.
Thanks in advance!
10/31 Edit: I forgot to mention that I'm using "R" to process this data.
The lat/long data for the region of interest was provided as a list of 4,518 files containing the lat/long coordinates in the name of each file:
x<- dir()
lenx<- length(x)
g <- strsplit(x, "_")
coord1 <- matrix(NA,nrow=lenx, ncol=1)
coord2 <- matrix(NA,nrow=lenx, ncol=1)
for(i in 1:lenx) {
coord1[i,1] <- unlist(g)[2+3*(i-1)]
coord2[i,1] <- unlist(g)[3+3*(i-1)]
}
coord1<-as.numeric(coord1)
coord2<-as.numeric(coord2)
coord<- cbind(coord1, coord2)
The lat/long and temperature data was obtained from an NCDF file for with temperature data for 10,875 lat/long pairs:
long<- tempcd$var[["Temp"]]$size[1]
lat<- tempcd$var[["Temp"]]$size[2]
time<- tempcd$var[["Temp"]]$size[3]
proj<- tempcd$var[["Temp"]]$size[4]
temp<- matrix(NA, nrow=lat*long, ncol = time)
lat_c<- matrix(NA, nrow=lat*long, ncol=1)
long_c<- matrix(NA, nrow=lat*long, ncol =1)
counter<- 1
for(i in 1:lat){
for(j in 1:long){
temp[counter,]<-get.var.ncdf(precipcd, varid= "Prcp", count = c(1,1,time,1), start=c(j,i,1,1))
counter<- counter+1
}
}
temp_gcm <- cbind(lat_c, long_c, temp)`
So now the question is how do you remove values from "temp_gcm" that correspond to lat/long data pairs from "coord?"
Upvotes: 4
Views: 3046
Reputation: 937
Noe,
I can think of a number of ways you could do this. The simplest, albeit not the most efficient would be to make use of R's which()
function, which takes a logical argument, while iterating over the data frame which you want to apply the matches to. Of course, this is assuming that there can be at most a single match in the larger data set. Based on your data sets, I would do something like this:
attach(temp_gcm) # adds the temp_gcm column names to the global namespace
attach(coord) # adds the coord column names to the global namespace
matched.temp = vector(length = nrow(coord)) # To store matching results
for (i in seq(coord)) {
matched.temp[i] = temp[which(lat_c == coord1[i] & long_c == coord2[i])]
}
# Now add the results column to the coord data frame (indexes match)
coord$temperature = matched.temp
The function which(lat_c == coord1[i] & long_c == coord2[i])
returns a vector of all rows in the dataframe temp_gcm
which satisfy lat_c
and long_c
matching coord1
and coord2
respectively from row i
in the iteration (NOTE: I'm assuming this vector will only have length 1, i.e. there is only 1 possible match). matched.temp[i]
will then be assigned the value from the column temp
in the dataframe temp_gcm
which satisfied the logical condition. Note that the goal in doing this is that we create a vector which has matched values that correspond by index to the rows of the dataframe coord
.
I hope this helps. Note that this is a rudimentary approach, and I would advise looking up the function merge()
as well as apply()
to do this in a more succinct manner.
Upvotes: 2
Reputation: 41
I added an additional column full of zeros to use as the resultant for an IF statement. "x" is the number of rows in temp_gcm. "y" is the number of columns (representative of time steps). "temp_s" is the standardized temperature data
indicator<- matrix(0, nrow = x, ncol = 1)
precip_s<- cbind(precip_s, indicator)
temp_s<- cbind(temp_s, indicator)
for(aa in 1:x){
current_lat<-latitudes[aa,1] #Latitudes corresponding to larger area
current_long<- longitudes[aa,1] #Longitudes corresponding to larger area
for(ab in 1:lenx){ #Lenx coresponds to nrow(coord)
if(current_lat == coord[ab,1] & current_long == coord[ab,2]) {
precip_s[aa,(y/12+1)]<-1 #y/12+1 corresponds to "indicator column"
temp_s[aa,(y/12+1)]<-1
}
}
}
precip_s<- precip_s[precip_s[,(y/12+1)]>0,] #Removes rows with "0"s remaining in "indcator" column
temp_s<- temp_s[temp_s[,(y/12+1)]>0,]
precip_s<- precip_s[,-(y/12+1)] #Removes "indicator column
temp_s<- temp_s[,-(y/12+1)]
Upvotes: 0