Reputation: 1598
I wonder are there more efficient ways to assign values to a new variable in a data frame, than using for loops. I have two recent example:
[1] Getting normalized Leveshtein distance using vwr package:
rst34$Levenshtein = rep(0, nrow(rst34))
for (i in 1:nrow(rst34)) {
rst34$Levenshtein[i] = levenshtein.distance(
as.character(rst34$target[i]), as.character(rst34$prime[i]))[[1]] /
max(nchar(as.character(rst34$target[i])), nchar(as.character(rst34$prime[i]))
)
}
[2] Extracting substring from another variable:
rst34$Experiment = 'rst4'
for (i in 1:nrow(rst34)) {
rst34$Experiment[i] = unlist(strsplit(as.character(rst34$subject[i]), '[.]'))[1]
}
Also, I think that there should be no difference between initializations in two examples:
rst34$Levenshtein = rep(0, nrow(rst34))
rst34$Experiment = 'rst4'
Many thanks!
Upvotes: 0
Views: 208
Reputation: 263461
It would only make sense to apply nchar to a character variable so the as.character calls are probably not needed:
rst34$Levenshtein <-
levenshtein.distance( rst34$target, rst34$prime) /
pmax(nchar(rst34$target),
nchar(rst34$prime ) )
Upvotes: 1
Reputation: 549
Something like...
rst34$Experiment = sapply(rst34$subject, function(element){
unlist(strsplit(as.character(element), '[.]'))[1]
})
Should hopefully do the trick. I don't have your data, so I couldn't actually test it out.
Upvotes: 1