Reputation: 182
Below is a sample string:
/site/50?ret=html&limit=8&phint=eid%3D283&phint=tcat%3D53159&phint=bin%3D1.99&phint=iid%3D301468384280&phint=type%3Duser&phint=pid%3D&phint=meta%3D11450&phint=gid%3D2&phint=inid%3D3&phint=tps%3D&phint=crm%3D3&phint=css%3D6&phint=cg%3D50a8abe714b0a7e37480bbe0fe9fe01e
Basically I need to do two levels of splitting. One is to split and take out all strings between two "&phint=" which I have done. Now my output is:
[[1]]
[1] "/site/50?ret=html&limit=8"
[2] "eid%3D283"
[3] "tcat%3D53159"
[4] "bin%3D1.99"
[5] "iid%3D301468384280"
[[3]]
[1] "/site/17001?ret=html&limit=8" "eid%3D278" "tcat%3D26395" "bin%3D0.0"
[5] "iid%3D0" "type%3Duser" "pid%3D" "meta%3D26395"
[9] "gid%3D1" "inid%3D5" "tps%3D" "crm%3D6"
[13] "css%3D10"
It is a list.
Now i need to split whenever i find %3D, i need to split it into two:
Example:
"eid%3D283"
should be written into two seperate data frame columns as:
eid in one column
283 into other column
Dis should be done to "n" no. of columns in 1 column matrix. This became a 1 column matrix after the 1st level split up.
Expected output:
Key Value
eid 283
tcat 53159
bin 1.99
and so on..
Any help is appreciated.
Thanks, Pravellika J
Upvotes: 0
Views: 1366
Reputation: 886938
You could try
res <- do.call(rbind.data.frame,
lapply(strsplit(as.character(dat1$Col), '&phint='), function(x)
do.call(rbind,lapply(strsplit(x[-1], '%3D'), function(y)
if(length(y)<2) rep(NA,2) else y))))
colnames(res) <- c('Key', 'Value')
head(res,2)
# Key Value
#1 eid 283
#2 tcat 53159
Based on the dput
output, the dataset have elements such as
se32%3DD%3Dc31
Therefore, it may better to have more than two columns to accommodate these cases,
lst1 <- lapply(strsplit(as.character(dat1$Col), '&phint='),
function(x) strsplit(x[-1], '%3D'))
lMax <- max(rapply(lst1, length))
res <- do.call(rbind.data.frame,lapply(lst1, function(x)
do.call(rbind,lapply(x, `length<-`, lMax))))
head(res,3)
# V1 V2 V3
#1 eid 283 <NA>
#2 tcat 53159 <NA>
#3 bin 1.99 <NA>
If we need to include one more column based on the "first" element after the strsplit
lst1 <- lapply(strsplit(as.character(dat1$Col), '&phint='),function(x) {
x1 <- as.numeric(sub("^.*/site/([0-9]+).*", "\\1",x[1]))
x2 <- strsplit(x[-1], '%3D')
c(x1,x2)})
lMax <- max(rapply(lst1, length))
res <- do.call(rbind,lapply(lst1, function(x)
setNames(data.frame(x[1],do.call(rbind,lapply(x[-1], `length<-`,
lMax))), paste0('V', seq_len(lMax+1)))))
head(res,3)
# V1 V2 V3 V4
#1 50 eid 283 <NA>
#2 50 tcat 53159 <NA>
#3 50 bin 1.99 <NA>
Upvotes: 3