Dunk Chawannut
Dunk Chawannut

Reputation: 105

Convert JSON embedded in CSV file fields to data frame

I'm pretty new to R. I have done some search but still have trouble with this issue.

This is an example of my data (CSV) out of 100,000 records

{id_outlet_delivery:22015,name:Branch A}6 ,

{id_outlet_delivery:22016,name:Branch B}7 ,

{id_outlet_delivery:22017,name:Branch C}8

What I want is to convert this to data frame so that I can perform some analysis. (I don't want the number 6,7,8 as in the example to be in the data frame).

Upvotes: 0

Views: 172

Answers (1)

akrun
akrun

Reputation: 887173

Here is a method using regex

 library(stringr)
  v1 <- '{id_outlet_delivery:22015,name:Branch A}6 ,
    {id_outlet_delivery:22016,name:Branch B}7 ,
    {id_outlet_delivery:22017,name:Branch C}8'

  nm1 <- str_extract_all(v1, perl("[[:alpha:]_]+(?=:)"))[[1]][1:2]
  val <- str_extract_all(v1, perl("(?<=:)[[:alnum:] ]+(?=\\})?"))[[1]]

Or you could use stringi which would be faster

  library(stringi)
  nm1 <- stri_extract_all_regex(v1, "[[:alpha:]_]+(?=:)")[[1]][1:2]
  val <- stri_extract_all_regex(v1, "(?<=:)[[:alnum:] ]+(?=\\}|,)")[[1]]


  indx <- c(TRUE, FALSE)
  dat <- setNames(data.frame(as.numeric(val[indx]), val[!indx],
                                     stringsAsFactors=FALSE), nm1)

  dat
  #  id_outlet_delivery     name
  #1              22015 Branch A
  #2              22016 Branch B
  #3              22017 Branch C

Upvotes: 1

Related Questions