maddy kemen
maddy kemen

Reputation: 77

How to parse key value pair of url string in R with multiple conditions

I have a string in the below format:

a <- c("first_name=James(Mr), cust_id=98503(ZZW_LG,WGE,zonaire),
       StartDate=2015-05-20, EndDate=2015-05-20, performance=best")

My aim is to get the final result in a dataframe as below:

first_name   cust_id   start_date    end_date    performance           cust_notes
 James(Mr)     98503   2015-05-20  2015-05-20           best   ZZW_LG,WGE,zonaire

I ran the following code:

a <- c("first_name=James(Mr), cust_id=98503(ZZW_LG,WGE,zonaire),
       StartDate=2015-05-20, EndDate=2015-05-20, performance=best")

split_by_comma <- strsplit(a,",")

split_by_equal <- lapply(split_by_comma,strsplit,"=")

Since the custid had got additional commas and brackets, I am not getting desired result.

Please note that brackets in first name are genuine and needed as it is.

Upvotes: 1

Views: 1658

Answers (3)

Gaurav Sagar
Gaurav Sagar

Reputation: 1

Late reply, but posted it since its very simple to understand and implement without using any additional packages

rawdf = read.csv("<your file path>", header = F, sep = ",", stringsAsFactors = F)
# Get the first row of the dataframe and transpose it into a column of a df
colnames = data.frame(t(rawdf[1,]))

# Split the values of the single column df created above into its key value
# pairs which are separated by '=' and save in a vector
colnames = unlist(strsplit(as.character(colnames$X1), "="))

# Pick up all the odd indexed values from the above vector (all odd places
# are colnames and even places the values associated with them)
colnames = colnames[seq(1,length(colnames),2)]

# Assign the extracted column names from the vector above to your original data frame
colnames(rawdf) = colnames

# Use the regex to extract the value in each field of the original df by
# replacing the 'Key=' pattern present in each field with an empty string 
for(i in 1:dim(rawdf)[2]) rawdf[,i] = gsub(paste(colnames[i],"=",sep=""), "", rawdf[,i])

Upvotes: 0

hrbrmstr
hrbrmstr

Reputation: 78792

If your string format holds true, this might be a quick solution:

library(httr)

a <- c("first_name=James(Mr), cust_id=98503(ZZW_LG,WGE,zonaire), StartDate=2015-05-20, 
        EndDate=2015-05-20, performance=best")

dat <- data.frame(parse_url(sprintf("?%s", gsub(",[[:space:]]+", "&", a)))$query, 
           stringsAsFactors=FALSE)

library(tidyr)
library(dplyr)

mutate(separate(dat, cust_id, into=c("cust_id", "cust_notes"), sep="\\("), 
       cust_notes=gsub("\\)", "", cust_notes))

##   first_name cust_id         cust_notes  StartDate    EndDate performance
## 1  James(Mr)   98503 ZZW_LG,WGE,zonaire 2015-05-20 2015-05-20        best

Extrapolation:

  • gsub(",[[:space:]]+", "&", a) makes the parameters look like a components of a URL query string.
  • sprintf(…) make it look like an actual query string
  • parse_url (from httr) will separate the key/value pairs out and stick them in a list (named query) in the returned list
  • data.frame will, well…
  • separate will split the cust_id column for you at the ( into two columns
  • mutate will remove the ) in the new cust_notes column

Here's the whole thing as a "pipe":

library(httr)
library(tidyr)
library(dplyr)
library(magrittr)

a <- c("first_name=James(Mr), cust_id=98503(ZZW_LG,WGE,zonaire), StartDate=2015-05-20, 
        EndDate=2015-05-20, performance=best")

a %>% 
  gsub(",[[:space:]]+", "&", .) %>% 
  sprintf("?%s", .) %>% 
  parse_url() %>% 
  extract2("query") %>% 
  data.frame(stringsAsFactors=FALSE) %>% 
  separate(cust_id, into=c("cust_id", "cust_notes"), sep="\\(") %>% 
  mutate(cust_notes=gsub("\\)", "", cust_notes))

which matches the extrapolation and is (IMO) easier to follow.

Upvotes: 0

vks
vks

Reputation: 67968

You need to split by this.

,(?![^()]*\\))

You need lookahead.This will not split by , within ().See demo.

https://regex101.com/r/uF4oY4/82

To get desired result use

split_by_comma <- strsplit(a,",(?![^()]*\\))",perl=TRUE)

split_by_equal <- lapply(split_by_comma,strsplit,"=")

Upvotes: 1

Related Questions