Reputation: 3441
I have this reproducible data.frame
representing UTM locations for five individuals (IndID
), each of which have 20 locations
EDIT: The different answers seem to result from running my for()
loop on the unsorted data.frame.
I have added code to arrange
the df by IndID
and now get the same answers as you.
library(plyr)
set.seed(123)
Data <- data.frame(IndID = rep(c("AAA", "BBB", "CCC", "DDD", "EEE"), 20),
UTM_E = sample(482000:509000, 100),
UTM_N = sample(4780000:4810500, 100)
)
Data <- arrange(Data, IndID)
And I also have this table containing a single Start
location for each individual.
set.seed(123)
Start <- data.frame(IndID = c("AAA", "BBB", "CCC", "DDD", "EEE"),
UTM_E = sample(482000:509000, 5),
UTM_N = sample(4780000:4810500, 5)
)
For each level of IndID I want to apply the following calculation to add a new column in Data.
For example, when Data$IndID == Start$IndID
I want to create
Data$NewValue = ((((Data$UTM_E - Start$UTM_E)/1000)^2) + (((Data$UTM_N - Start$UTM_N)/1000)^2))
While I know this is possible with the following for()
loop and formatting code, I suspect there is a better vector approach that would be much cleaner and more efficient.
Inds <- unique(Data$IndID)
NewValue <- list()
for (i in 1:length(Inds)){
NewValue[[i]] = ((((Data$UTM_E[Data$IndID == Inds[i]] - Start$UTM_E[i])/1000)^2) +
(((Data$UTM_N[Data$IndID == Inds[i]] - Start$UTM_N[i])/1000)^2))
}
Data$NewValue <- c(do.call("cbind",NewValue))
head(Data)
tail(Data)
Any suggestions on how to 'vectorize' the above for()
loop would be appreciated.
Upvotes: 2
Views: 55
Reputation: 24945
We can use merge
to make one data.frame, then vectorise from there:
data2 <- merge(Data, Start, by = "IndID")
data2$NewValue <- ((data2$UTM_E.x - data2$UTM_E.y)/1000)^2 +
((data2$UTM_N.x - data2$UTM_N.y)/1000)^2
Upvotes: 2
Reputation: 92282
I would recommend using data.table
s binary join and update by reference capabilities for the task
library(data.table)
setkey(setDT(Data), IndID)[Start, NewValue := ((UTM_E - i.UTM_E)/1e3)^2 +
((UTM_N - i.UTM_N)/1e3)^2]
Note me and @jeremycg getting different results from yours. It seems like you have some error in your calculations.
The idea here is to key by the common column. the perform a binary join, and while joining update the NewValue
column in place using :=
. The i.
before the column names is meant to distinguish between the same columns in Data
and Start
Upvotes: 2