Reputation: 195
I am trying to add two columns to data.table. The original structure is below:
> aTable
word freq
1: thanks for the follow 612
2: the end of the 491
3: the rest of the 462
4: at the end of 409
5: is going to be 359
6: for the first time 355
7: at the same time 346
8: cant wait to see 338
9: thank you for the 334
10: thanks for the rt 321
My code is as follows:
myKeyValfun <- function(line) {
ret1 = paste(head(strsplit(dtable4G$word,split=" ")[[1]],3), collapse=" ")
ret2 = tail(strsplit(line,split=" ")[[1]],1)
return(list(key = ret1, value = ret2))
}
aTable[, c("key","value") := myKeyValfun(word)]
After I execute this, I noticed that only that the value are correctly updated.Only the first row has the correct values. The other rows has the same values as the first rows.
See below:
> aTable
word freq key value
1: thanks for the follow 612 thanks for the follow
2: the end of the 491 thanks for the follow
3: the rest of the 462 thanks for the follow
4: at the end of 409 thanks for the follow
5: is going to be 359 thanks for the follow
6: for the first time 355 thanks for the follow
7: at the same time 346 thanks for the follow
8: cant wait to see 338 thanks for the follow
9: thank you for the 334 thanks for the follow
10: thanks for the rt 321 thanks for the follow
Any ideas?
Adding the expected result as requested by akrun:
> aTable
word freq key value
1: thanks for the follow 612 thanks for the follow
2: the end of the 491 the end of the
3: the rest of the 462 the rest of the
4: at the end of 409 at the end of
5: is going to be 359 is going to be
6: for the first time 355 for the first time
7: at the same time 346 at the same time
8: cant wait to see 338 cant wait to see
9: thank you for the 334 thank you for the
10: thanks for the rt 321 thanks for the rt
Upvotes: 0
Views: 61
Reputation: 887521
If we need to extract the first three words in to 'key' and the last word to 'value', one option is sub
aTable[, c('key', 'value') := list(sub('(.*)\\s+.*', '\\1', word), sub('.*\\s+', '', word))]
aTable
# word freq key value
# 1: thanks for the follow 612 thanks for the follow
# 2: the end of the 491 the end of the
# 3: the rest of the 462 the rest of the
# 4: at the end of 409 at the end of
# 5: is going to be 359 is going to be
# 6: for the first time 355 for the first time
# 7: at the same time 346 at the same time
# 8: cant wait to see 338 cant wait to see
# 9: thank you for the 334 thank you for the
#10: thanks for the rt 321 thanks for the rt
Or we use tstrsplit
aTable[, c('key', 'value') := {
tmp <- tstrsplit(word, ' ')
list(do.call(paste, tmp[1:3]), tmp[[4]])}]
Upvotes: 3