B. Davis
B. Davis

Reputation: 3441

Adding values based on levels of a factor

I have this reproducible data.frame representing UTM locations for five individuals (IndID), each of which have 20 locations

EDIT: The different answers seem to result from running my for() loop on the unsorted data.frame. I have added code to arrange the df by IndID and now get the same answers as you.

library(plyr)

set.seed(123)
Data <- data.frame(IndID = rep(c("AAA", "BBB", "CCC", "DDD", "EEE"), 20),
                UTM_E = sample(482000:509000, 100),
                UTM_N = sample(4780000:4810500, 100)
                    )

Data <- arrange(Data, IndID)

And I also have this table containing a single Start location for each individual.

set.seed(123)
Start <- data.frame(IndID = c("AAA", "BBB", "CCC", "DDD", "EEE"),
                UTM_E = sample(482000:509000, 5),
                UTM_N = sample(4780000:4810500, 5)
                    )

For each level of IndID I want to apply the following calculation to add a new column in Data. For example, when Data$IndID == Start$IndID I want to create

Data$NewValue = ((((Data$UTM_E - Start$UTM_E)/1000)^2) + (((Data$UTM_N - Start$UTM_N)/1000)^2))

While I know this is possible with the following for() loop and formatting code, I suspect there is a better vector approach that would be much cleaner and more efficient.

Inds <- unique(Data$IndID)
NewValue <- list()
for (i in 1:length(Inds)){
    NewValue[[i]] = ((((Data$UTM_E[Data$IndID == Inds[i]] - Start$UTM_E[i])/1000)^2) + 
            (((Data$UTM_N[Data$IndID == Inds[i]] - Start$UTM_N[i])/1000)^2))
                    }

Data$NewValue <- c(do.call("cbind",NewValue)) 

head(Data)
tail(Data)

Any suggestions on how to 'vectorize' the above for() loop would be appreciated.

Upvotes: 2

Views: 55

Answers (2)

jeremycg
jeremycg

Reputation: 24945

We can use merge to make one data.frame, then vectorise from there:

data2 <- merge(Data, Start, by = "IndID")
data2$NewValue <- ((data2$UTM_E.x - data2$UTM_E.y)/1000)^2 + 
                  ((data2$UTM_N.x - data2$UTM_N.y)/1000)^2

Upvotes: 2

David Arenburg
David Arenburg

Reputation: 92282

I would recommend using data.tables binary join and update by reference capabilities for the task

library(data.table)
setkey(setDT(Data), IndID)[Start, NewValue := ((UTM_E - i.UTM_E)/1e3)^2 + 
                                              ((UTM_N - i.UTM_N)/1e3)^2] 

Note me and @jeremycg getting different results from yours. It seems like you have some error in your calculations.


The idea here is to key by the common column. the perform a binary join, and while joining update the NewValue column in place using :=. The i. before the column names is meant to distinguish between the same columns in Data and Start

Upvotes: 2

Related Questions