Reputation: 105
I'm pretty new to R. I have done some search but still have trouble with this issue.
This is an example of my data (CSV) out of 100,000 records
{id_outlet_delivery:22015,name:Branch A}6 ,
{id_outlet_delivery:22016,name:Branch B}7 ,
{id_outlet_delivery:22017,name:Branch C}8
What I want is to convert this to data frame so that I can perform some analysis. (I don't want the number 6,7,8 as in the example to be in the data frame).
Upvotes: 0
Views: 172
Reputation: 887173
Here is a method using regex
library(stringr)
v1 <- '{id_outlet_delivery:22015,name:Branch A}6 ,
{id_outlet_delivery:22016,name:Branch B}7 ,
{id_outlet_delivery:22017,name:Branch C}8'
nm1 <- str_extract_all(v1, perl("[[:alpha:]_]+(?=:)"))[[1]][1:2]
val <- str_extract_all(v1, perl("(?<=:)[[:alnum:] ]+(?=\\})?"))[[1]]
Or you could use stringi
which would be faster
library(stringi)
nm1 <- stri_extract_all_regex(v1, "[[:alpha:]_]+(?=:)")[[1]][1:2]
val <- stri_extract_all_regex(v1, "(?<=:)[[:alnum:] ]+(?=\\}|,)")[[1]]
indx <- c(TRUE, FALSE)
dat <- setNames(data.frame(as.numeric(val[indx]), val[!indx],
stringsAsFactors=FALSE), nm1)
dat
# id_outlet_delivery name
#1 22015 Branch A
#2 22016 Branch B
#3 22017 Branch C
Upvotes: 1