Marcelo Fernandes
Marcelo Fernandes

Reputation: 143

How to replace comma between brackets in a string?

I need edit a data.frame in R where some variables are string by [xx xx xxx] format. I tried by gsub function (unsuccessfully).

Example:

aux = '1233,[9 087],03/10/1951,[437 ab 345] ,"ab c", [ 001     ab ]'
gsub("\\[(.*),(.*)\\]","[\\1 \\2]", aux)

Objective: replace spaces only between brackets data to commas.

"1233,[9,087],03/10/1951,[437,ab,345] ,\"ab c\", [001,ab]"

...but, the results with gsub code above is that:

[1] "1233,[9 087],03/10/1951,[437 ab 345] ,\"ab c\", [ 001     ab,]"

Note that the sizes of the spaces are irregular. The idea is replace all spaces into brackets "[]" to a comma ",", except the spaces before the first and after the last character.

How can I do it?

Upvotes: 2

Views: 1628

Answers (2)

LukStorms
LukStorms

Reputation: 29647

Doing this in 2 steps.
And less awesome than the solution from regex-master Wiktor.

Some assumptions were made for the sake of simplicity.

  • It's just spaces, not other whitespace characters (--> not using \s)
  • Just letters and numbers between those spaces (--> using \w)
aux = "1233,[9 087],03/10/1951,[437 ab 345] ,\"ab c\", [ 001     ab ]"

# remove the spaces after a "[" or before a "]"
result = gsub("(?<=\\[) +| +(?=\\])", "", aux, perl=TRUE)

# find a "[". Reset and look for spaces followed by word characters. 
# And replace those matches by a comma and the word characters
result = gsub("(?:\\[ *\\w+\\K|\\G) +(\\w+)", ",\\1", result, perl=TRUE)

cat(result, "\n")

An R-Fiddle can be found here

Upvotes: 1

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626689

Assuming the spaces you need to replace with a comma have no nested or other square brackets inside, you may use a PCRE regex with gsub:

aux = '1233,[9 087],03/10/1951,[437 ab 345] ,"ab c", [ 001     ab ]'
res = gsub("(?:\\G(?!^)|\\[\\s*)[^][\\s]*\\K\\s++(?!])(?=[^][]*])", ",", aux, perl=TRUE)
cat(res, "\n")
## => 1233,[9,087],03/10/1951,[437,ab,345] ,"ab c", [ 001,ab ]

See the R demo and the regex demo.

Here is an explanation:

  • (?:\G(?!^)|\[\s*) - the location after the last successful match (\G(?!\A)) or the [ and zero or more whitespaces
  • [^][\s]* - 0+ chars other than ], [ and whitespaces
  • \K - a match reset operator
  • \s++ - 1+ whitespaces matched possessively (no backtracking into the pattern, and the next negative lookahead will be only checked after the last whitespace matched)
  • (?!]) - there must be no ] immediately to the right of the current location
  • (?=[^][]*]) - there must be 0+ chars other than [ and ] and then a ] immediately to the right of the current location

If you consider a non-base R approach, I can recommend gsubfn:

library(gsubfn)
rx <- "\\[([^][]+)]"
aux = '1233,[9 087],03/10/1951,[437 ab 345] ,"ab c", [ 001     ab ]'
gsubfn(rx, function(g1) paste0("[",gsub("\\s+", ",", trimws(g1)),"]"), aux)
## => [1] "1233,[9,087],03/10/1951,[437,ab,345] ,\"ab c\", [001,ab]"

Here, \\[([^][]+)] matches substrings that start with [, then have 1+ chars other than [ and ] and then ], and once these matches are found, Group 1 subvalue is trimmed with trimws() and all 1+ whitespace chunks are replaced with a comma (with gsub("\\s+", ",", trimws(g1))).

Upvotes: 3

Related Questions