user227710
user227710

Reputation: 3194

grab all numeric data in final two columns, the position of which varies by rows

I have a sample data as follows:

dt1 <- setDT(structure(list(V1 = c(301L, 301L, 301L, 301L, 301L), V2 = 1:5, 
    V3 = c(61950L, 61951L, 61953L, 155220L, 155218L), V4 = c("i", 
    "you", "you", "we", "they"), V5 = c("believe", "think", "are", 
    "laugh", "smile"), V6 = c("we", "they", "okay", "490", "490"
    ), V7 = c("can", "500", "with", "31", "31"), V8 = c("use", 
    "32", "that", "", ""), V9 = c("datatable", "", "500", "", 
    ""), V10 = c("always", "", "32", "", ""), V11 = c("500", 
    "", "", "", ""), V12 = c("32", "", "", "", "")), .Names = c("V1", 
"V2", "V3", "V4", "V5", "V6", "V7", "V8", "V9", "V10", "V11", 
"V12"), row.names = c(NA, -5L), class = "data.frame"))

   V1 V2     V3   V4      V5   V6   V7   V8        V9    V10 V11 V12
1: 301  1  61950    i believe   we  can  use datatable always 500  32
2: 301  2  61951  you   think they  500   32                         
3: 301  3  61953  you     are okay with that       500     32        
4: 301  4 155220   we   laugh  490   31                              
5: 301  5 155218 they   smile  490   31  

I want it to convert to something like below:

    V1 V2     V3                               newcol1 newcol2 newcol3
1: 301  1  61950 I believe we can use datatable always     500      32
2: 301  2  61951                        you think they     500      32
3: 301  3  61953                you are okay with that     500      32
4: 301  4 155220                              we laugh     490      31
5: 301  5 155218                            they smile     490      31

Mechanism:

Please suggest only data.table solution.

Upvotes: 1

Views: 35

Answers (1)

Frank
Frank

Reputation: 66819

You can do

rowid_vars = c("V1","V2","V3")
melt(dt1, id=rowid_vars)[value!="", .(
  nc1 = paste(value[-(.N-1:0)], collapse=" "), 
  nc2 = as.integer(value[.N-1]), 
  nc3 = as.integer(value[.N])
), by=rowid_vars]


    V1 V2     V3                                   nc1 nc2 nc3
1: 301  1  61950 i believe we can use datatable always 500  32
2: 301  2  61951                        you think they 500  32
3: 301  3  61953                you are okay with that 500  32
4: 301  4 155220                              we laugh 490  31
5: 301  5 155218                            they smile 490  31

I guess there is some way you could have read in the data so as to avoid this problem, but I don't know how.

Upvotes: 4

Related Questions