Remove multiple punctuations scattered throughout string

Question

These are the default column names produced.

columnNames
[1] "chain:1.theta[1]" "chain:1.theta[2]" "chain:1.theta[3]" "chain:1.theta[4]"

I would like columnNames to be:

[1] "theta1" "theta2" "theta3" "theta4"

I would like to do this using one regular expression. I have tried a few different approaches with no success.

> gsub('chain:[[:digit:]][[:punct:]]', '', columnNames)
[1] "theta[1]" "theta[2]" "theta[3]" "theta[4]"

> gsub('chain:[[:digit:]].$$|$$', '', columnNames)
[1] "chain:1.theta[1" "chain:4.theta[2" "chain:1.theta[3" "chain:4.theta[4"

> gsub('(?=.*chain:[[:digit:]][[:punct:]])(?=.*"$$|$$)', '', columnNames, perl = TRUE)
[1] "chain:1.theta[1]" "chain:4.theta[2]" "chain:1.theta[3]" "chain:4.theta[4]

> gsub('(?!theta\[[:digit:]])', '', columnNames, perl = TRUE)
Error in gsub("(?!theta\[[:digit:]])", "", columnNames, perl = TRUE) : 
  invalid regular expression '(?!theta\[[:digit:]])'
In addition: Warning message:
In gsub("(?!theta\[[:digit:]])", "", columnNames, perl = TRUE) :
  PCRE pattern compilation error
    'POSIX named classes are supported only within a class'
    at '[:digit:]])'

Julius Vainora · Accepted Answer

gsub(".*\.(.*)$$(\d+)$$", "\1\2", columnNames)
[1] "theta1" "theta2" "theta3" "theta4"

where .*\. matches everything up to and including a dot, (.*) corresponds to theta in this case, and (\d+) to the theta numbers.

Remove multiple punctuations scattered throughout string

Answers (1)

Related Questions