Pravellika
Pravellika

Reputation: 182

Split function in r

Below is a sample string:

/site/50?ret=html&limit=8&phint=eid%3D283&phint=tcat%3D53159&phint=bin%3D1.99&phint=iid%3D301468384280&phint=type%3Duser&phint=pid%3D&phint=meta%3D11450&phint=gid%3D2&phint=inid%3D3&phint=tps%3D&phint=crm%3D3&phint=css%3D6&phint=cg%3D50a8abe714b0a7e37480bbe0fe9fe01e

Basically I need to do two levels of splitting. One is to split and take out all strings between two "&phint=" which I have done. Now my output is:

[[1]]
 [1] "/site/50?ret=html&limit=8"                                                                
 [2] "eid%3D283"                                                                                
 [3] "tcat%3D53159"                                                                             
 [4] "bin%3D1.99"                                                                               
 [5] "iid%3D301468384280" 
[[3]]
 [1] "/site/17001?ret=html&limit=8"          "eid%3D278"                             "tcat%3D26395"                          "bin%3D0.0"                            
 [5] "iid%3D0"                               "type%3Duser"                           "pid%3D"                                "meta%3D26395"                         
 [9] "gid%3D1"                               "inid%3D5"                              "tps%3D"                                "crm%3D6"                              
[13] "css%3D10"

It is a list. Now i need to split whenever i find %3D, i need to split it into two: Example: "eid%3D283" should be written into two seperate data frame columns as:

eid in one column
283 into other column

Dis should be done to "n" no. of columns in 1 column matrix. This became a 1 column matrix after the 1st level split up.

Expected output:
Key           Value
eid            283
tcat           53159
bin            1.99
and so on..

Any help is appreciated.

Thanks, Pravellika J

Upvotes: 0

Views: 1366

Answers (1)

akrun
akrun

Reputation: 886938

You could try

res <- do.call(rbind.data.frame,
       lapply(strsplit(as.character(dat1$Col), '&phint='), function(x) 
           do.call(rbind,lapply(strsplit(x[-1], '%3D'), function(y) 
               if(length(y)<2) rep(NA,2) else y))))


colnames(res) <- c('Key', 'Value')
head(res,2)
#   Key Value
#1  eid   283
#2 tcat 53159

Update

Based on the dput output, the dataset have elements such as

 se32%3DD%3Dc31        

Therefore, it may better to have more than two columns to accommodate these cases,

 lst1 <- lapply(strsplit(as.character(dat1$Col), '&phint='), 
              function(x) strsplit(x[-1], '%3D'))
 lMax <- max(rapply(lst1, length))

 res <- do.call(rbind.data.frame,lapply(lst1, function(x) 
           do.call(rbind,lapply(x, `length<-`, lMax))))
 head(res,3)
 #   V1    V2   V3
 #1  eid   283 <NA>
 #2 tcat 53159 <NA>
 #3  bin  1.99 <NA>

Update2

If we need to include one more column based on the "first" element after the strsplit

lst1 <- lapply(strsplit(as.character(dat1$Col), '&phint='),function(x) {
 x1 <-  as.numeric(sub("^.*/site/([0-9]+).*", "\\1",x[1]))
 x2 <- strsplit(x[-1], '%3D')
 c(x1,x2)})

 lMax <- max(rapply(lst1, length))
 res <- do.call(rbind,lapply(lst1, function(x) 
     setNames(data.frame(x[1],do.call(rbind,lapply(x[-1], `length<-`, 
             lMax))), paste0('V', seq_len(lMax+1)))))
 head(res,3)
 #  V1   V2    V3   V4
 #1 50  eid   283 <NA>
 #2 50 tcat 53159 <NA>
 #3 50  bin  1.99 <NA>

Upvotes: 3

Related Questions