Reputation: 1475
I need to apply a custom function to all rows in a data.table which has columns freq (numeric), ngram (text with each word separated by _). I also supply 3 values which are constant - input1gramCount, input2gramCount, input3gramCount and not in the data.table.
When I try this, i get the warning
Warning message:
In if (MatchedLen == 4) { :
the condition has length > 1 and only the first element will be used
It seems to be complaining that 4 is not vectorised, but I want it to be a constant. Any pointers welcome...
# Stupid Backoff
StupidBackoffScore <- function(freq, ngram, input1gramCount, input2gramCount, input3gramCount) {
matchedLen = str_count(ngram, "_") + 1
if (matchedLen == 4) {
score = freq / input3gramCount
} else if (matchedLen == 3) {
score = 0.4 * freq / input2gramCount
} else {
# must be matchedLen 2
score = 0.4 * 0.4 * freq / input1gramCount
}
return(score)
}
allGrams <- allGrams %>%
mutate(stupidBOScore = StupidBackoffScore(frequency, ngram, input1gramCount, input2gramCount, input3gramCount))
Upvotes: 0
Views: 61
Reputation: 213
I would do it like this:
setDT(dt)
dt[, matchedLen := str_count(ngram, "_") + 1 ]
dt[, score := ifelse(matchedLen == 4, freq / input3gramCount,
ifelse(matchedLen == 3, 0.4 * freq / input2gramCount,
0.4 * 0.4 * freq / input1gramCount)) ]
For readability, I created matchedLen
as a separate column. If you do not need matchedLen
you can delete it after the score is created.
Upvotes: 3