Reputation: 143
I need edit a data.frame in R where some variables are string by [xx xx xxx] format. I tried by gsub function (unsuccessfully).
Example:
aux = '1233,[9 087],03/10/1951,[437 ab 345] ,"ab c", [ 001 ab ]'
gsub("\\[(.*),(.*)\\]","[\\1 \\2]", aux)
Objective: replace spaces only between brackets data to commas.
"1233,[9,087],03/10/1951,[437,ab,345] ,\"ab c\", [001,ab]"
...but, the results with gsub code above is that:
[1] "1233,[9 087],03/10/1951,[437 ab 345] ,\"ab c\", [ 001 ab,]"
Note that the sizes of the spaces are irregular. The idea is replace all spaces into brackets "[]" to a comma ",", except the spaces before the first and after the last character.
How can I do it?
Upvotes: 2
Views: 1628
Reputation: 29647
Doing this in 2 steps.
And less awesome than the solution from regex-master Wiktor.
Some assumptions were made for the sake of simplicity.
\s
)\w
)aux = "1233,[9 087],03/10/1951,[437 ab 345] ,\"ab c\", [ 001 ab ]"
# remove the spaces after a "[" or before a "]"
result = gsub("(?<=\\[) +| +(?=\\])", "", aux, perl=TRUE)
# find a "[". Reset and look for spaces followed by word characters.
# And replace those matches by a comma and the word characters
result = gsub("(?:\\[ *\\w+\\K|\\G) +(\\w+)", ",\\1", result, perl=TRUE)
cat(result, "\n")
An R-Fiddle can be found here
Upvotes: 1
Reputation: 626689
Assuming the spaces you need to replace with a comma have no nested or other square brackets inside, you may use a PCRE regex with gsub
:
aux = '1233,[9 087],03/10/1951,[437 ab 345] ,"ab c", [ 001 ab ]'
res = gsub("(?:\\G(?!^)|\\[\\s*)[^][\\s]*\\K\\s++(?!])(?=[^][]*])", ",", aux, perl=TRUE)
cat(res, "\n")
## => 1233,[9,087],03/10/1951,[437,ab,345] ,"ab c", [ 001,ab ]
See the R demo and the regex demo.
Here is an explanation:
(?:\G(?!^)|\[\s*)
- the location after the last successful match (\G(?!\A)
) or the [
and zero or more whitespaces[^][\s]*
- 0+ chars other than ]
, [
and whitespaces\K
- a match reset operator\s++
- 1+ whitespaces matched possessively (no backtracking into the pattern, and the next negative lookahead will be only checked after the last whitespace matched)(?!])
- there must be no ]
immediately to the right of the current location(?=[^][]*])
- there must be 0+ chars other than [
and ]
and then a ]
immediately to the right of the current locationIf you consider a non-base R approach, I can recommend gsubfn
:
library(gsubfn)
rx <- "\\[([^][]+)]"
aux = '1233,[9 087],03/10/1951,[437 ab 345] ,"ab c", [ 001 ab ]'
gsubfn(rx, function(g1) paste0("[",gsub("\\s+", ",", trimws(g1)),"]"), aux)
## => [1] "1233,[9,087],03/10/1951,[437,ab,345] ,\"ab c\", [001,ab]"
Here, \\[([^][]+)]
matches substrings that start with [
, then have 1+ chars other than [
and ]
and then ]
, and once these matches are found, Group 1 subvalue is trimmed with trimws()
and all 1+ whitespace chunks are replaced with a comma (with gsub("\\s+", ",", trimws(g1))
).
Upvotes: 3