Keon-Woong Moon
Keon-Woong Moon

Reputation: 226

Regular Expression to extract function arguments in R

I have a problem to extract function arguments in R.

    x="theme(legend.position='bottom', 
    legend.margin=(t=0,r=0,b=0,l=0,unit='mm'), 
    legend.background=element_rect(fill='red',size=rel(1.5)), 
    panel.background=element_rect(fill='red'),
    legend.position='bottom')"

What I want is:

[1]legend.position='bottom'
[2]legend.margin=(t=0,r=0,b=0,l=0,unit='mm')
[3]legend.background=element_rect(fill='red',size=rel(1.5))
[4]panel.background=element_rect(fill='red')
[5]legend.position='bottom'

I tried several regular expressions without success including followings:

strsplit(x,",(?![^()]*\\))",perl=TRUE)

Please help me!

Upvotes: 1

Views: 317

Answers (3)

Keon-Woong Moon
Keon-Woong Moon

Reputation: 226

Thank you for all your advice. I have parsed the sentences and get the arguments as list. Here is my solution.

x<-"theme(legend.margin=margin(t=0,r=0,b=0,l=0,unit='mm'),
legend.background=element_rect(fill='red',size=rel(1.5)),
panel.background=element_rect(fill='red'),
legend.position='bottom')" 

extractArgs=function(x){

result<-tryCatch(eval(parse(text=x)),error=function(e) return("error"))

if("character" %in% class(result)){
    args=character(0)
} else {
    if(length(names(result)>0)){
       pos=unlist(str_locate_all(x,names(result)))
       pos=c(sort(pos[seq(1,length(pos),by=2)]),nchar(x)+1)

       args=c()
       for(i in 1:(length(pos)-1)){
         args=c(args,substring(x,pos[i],lead(pos)[i]-2))
       } 

  } else{
      args=character(0)
  }
}
args
}

Upvotes: 0

Tim Biegeleisen
Tim Biegeleisen

Reputation: 522386

I think the best answer here might be to not attempt to use a regex to parse your function call. As the name implies, regular expressions require regular language. Your function call is not regular, because it has nested parentheses. I currently see a max nested depth of two, but who knows if that could get deeper at some point.

I would recommend writing a simple parser instead. You can use a stack here, to keep track of parentheses. And you would only split a parameter off if all parentheses were closed, implying that you are not in the middle of a parameter, excepting possibly the very first one.

Upvotes: 1

pirs
pirs

Reputation: 2463

Arf, I'm really sorry but i have to go work, i will continue later but for now i just let my way to solve it partially : theme\(([a-z.]*=['a-z]*)|([a-z._]*=[a-z0-9=,'_.()]*)*\,\)?

It misses only the last part..

Here the regex101 page : https://regex101.com/r/BZpcW0/2

See you later.

Upvotes: 0

Related Questions