Chris
Chris

Reputation: 1475

Applying Custom Function to Data.Table

I need to apply a custom function to all rows in a data.table which has columns freq (numeric), ngram (text with each word separated by _). I also supply 3 values which are constant - input1gramCount, input2gramCount, input3gramCount and not in the data.table.

When I try this, i get the warning

Warning message:
In if (MatchedLen == 4) { :
the condition has length > 1 and only the first element will be used

It seems to be complaining that 4 is not vectorised, but I want it to be a constant. Any pointers welcome...

# Stupid Backoff
StupidBackoffScore <- function(freq, ngram, input1gramCount, input2gramCount, input3gramCount) {
    matchedLen = str_count(ngram, "_") + 1
    if (matchedLen == 4) {
        score = freq / input3gramCount
    } else if (matchedLen == 3) {
        score = 0.4 * freq / input2gramCount
    } else {
        # must be matchedLen 2
        score = 0.4 * 0.4 * freq / input1gramCount
    }
    return(score)
    }

allGrams <- allGrams %>%
    mutate(stupidBOScore = StupidBackoffScore(frequency, ngram, input1gramCount, input2gramCount, input3gramCount))

Upvotes: 0

Views: 61

Answers (1)

maop
maop

Reputation: 213

I would do it like this:

setDT(dt)
dt[, matchedLen := str_count(ngram, "_") + 1 ]
dt[, score := ifelse(matchedLen == 4, freq / input3gramCount,
                     ifelse(matchedLen == 3, 0.4 * freq / input2gramCount,
                        0.4 * 0.4 * freq / input1gramCount)) ]

For readability, I created matchedLen as a separate column. If you do not need matchedLen you can delete it after the score is created.

Upvotes: 3

Related Questions