PeterV
PeterV

Reputation: 195

New column added to data.table does not reflect correct computed values

I am trying to add two columns to data.table. The original structure is below:

> aTable
                    word freq
 1: thanks for the follow  612
 2:        the end of the  491
 3:       the rest of the  462
 4:         at the end of  409
 5:        is going to be  359
 6:    for the first time  355
 7:      at the same time  346
 8:      cant wait to see  338
 9:     thank you for the  334
10:     thanks for the rt  321

My code is as follows:

myKeyValfun <- function(line) {
  ret1 = paste(head(strsplit(dtable4G$word,split=" ")[[1]],3), collapse=" ")
  ret2 = tail(strsplit(line,split=" ")[[1]],1)
  return(list(key = ret1, value = ret2))
}

aTable[, c("key","value") := myKeyValfun(word)]

After I execute this, I noticed that only that the value are correctly updated.Only the first row has the correct values. The other rows has the same values as the first rows.

See below:

> aTable
                     word freq            key  value
 1: thanks for the follow  612 thanks for the follow
 2:        the end of the  491 thanks for the follow
 3:       the rest of the  462 thanks for the follow
 4:         at the end of  409 thanks for the follow
 5:        is going to be  359 thanks for the follow
 6:    for the first time  355 thanks for the follow
 7:      at the same time  346 thanks for the follow
 8:      cant wait to see  338 thanks for the follow
 9:     thank you for the  334 thanks for the follow
10:     thanks for the rt  321 thanks for the follow

Any ideas?

Adding the expected result as requested by akrun:

> aTable
                     word freq            key  value
 1: thanks for the follow  612 thanks for the follow
 2:        the end of the  491     the end of    the
 3:       the rest of the  462    the rest of    the
 4:         at the end of  409     at the end     of
 5:        is going to be  359    is going to     be
 6:    for the first time  355  for the first   time
 7:      at the same time  346    at the same   time
 8:      cant wait to see  338   cant wait to    see
 9:     thank you for the  334   thank you for   the
10:     thanks for the rt  321  thanks for the    rt

Upvotes: 0

Views: 61

Answers (1)

akrun
akrun

Reputation: 887521

If we need to extract the first three words in to 'key' and the last word to 'value', one option is sub

aTable[, c('key', 'value') := list(sub('(.*)\\s+.*', '\\1', word), sub('.*\\s+', '', word))]
aTable
#                     word freq            key  value
# 1: thanks for the follow  612 thanks for the follow
# 2:        the end of the  491     the end of    the
# 3:       the rest of the  462    the rest of    the
# 4:         at the end of  409     at the end     of
# 5:        is going to be  359    is going to     be
# 6:    for the first time  355  for the first   time
# 7:      at the same time  346    at the same   time
# 8:      cant wait to see  338   cant wait to    see
# 9:     thank you for the  334  thank you for    the
#10:     thanks for the rt  321 thanks for the     rt

Or we use tstrsplit

aTable[, c('key', 'value') := {
             tmp <- tstrsplit(word, ' ')
             list(do.call(paste, tmp[1:3]), tmp[[4]])}]

Upvotes: 3

Related Questions