Kspr
Kspr

Reputation: 685

Regexp pattern to remove spaces next to brackets and replace any spaces between words/characters inside the brackets with single comma

I have strings on similar format

hello this is an example [ a b c ]

hello this is another example [ cat bird dog elephant ]

Which I want to transform to

hello this is an example [a,b,c]

hello this is another example [cat,bird,dog,elephant]

But I don't understand how to create a regexp pattern that removes any spaces next to the brackets and replaces any number of spaces between words/characters inside the brackets with a single ,.

How would one create such a pattern?

My current attempt is a chain of regexp replacements.

m = re.sub('\[\s+','[',s)
m = re.sub('\s+\]',']',m)
m = re.sub('\s+',' ',m)
m = re.sub(r'\s(?=[^\[\]]*])', ",", m)

But does anyone have any suggestion on how to make it more efficient or more clean?

Upvotes: 1

Views: 568

Answers (4)

The fourth bird
The fourth bird

Reputation: 163217

You can use a negated character class with a single capture group, and then replace 1 or more spaces with a single comma in group 1 and wrap the result in between square brackets.

\[\s*([^][]*?)\s*]

The pattern matches:

  • \[ Match [
  • \s* Match optional leading whitespace chars
  • ( Capture group 1
    • [^][]*?Optionally repeat chars other than [ and ], as few as possible
  • ) Close group 1
  • \s*
  • ] Match literally

See a regex demo with the capture group value and a Python demo.

import re

strings = [
    "hello this is an example [ a    b c ]",
    "hello this is another example [ cat    bird dog elephant   ]"
]

pattern = r"\[\s*([^][]*?)\s*]"
for s in strings:
    print(re.sub(pattern, lambda m: "[{0}]".format(re.sub(r"\s+", ',', m.group(1))), s))

Output

hello this is an example [a,b,c]
hello this is another example [cat,bird,dog,elephant]

Upvotes: 2

Marcel Preda
Marcel Preda

Reputation: 1205

Below is my solution, some comments added.

For the second part (replacing spaces between square brackets with comma, I would rather go for a split() and join() - regex solution is for sure slower.)

import re

str1 = 'hello this is an example [ a    b c ]'

str2 = 'hello this is another example [ cat    bird dog elephant   ]'

# remove the SPACES near square brackets
str1 =  re.sub(r'\[\s*(.*\S)\s*\]', r'[\1]', str1)
print(str1)
# replace the SPACES inside the square brackets until no replacement
old_str1 = ''
while old_str1 != str1:
    old_str1 = str1
    str1 =  re.sub(r'\[(\S*)\s+(.*)\]', r'[\1,\2]', str1, count=0)
print(str1)


str2 =  re.sub(r'\[\s*(.*\S)\s*\]', r'[\1]', str2)
print(str2)
old_str2 = ''
while old_str2 != str2:
    old_str2 = str2
    str2 =  re.sub(r'\[(\S*)\s+(.*)\]', r'[\1,\2]', str2, count=0)
print(str2)

output

hello this is an example [a    b c]
hello this is an example [a,b,c]
hello this is another example [cat    bird dog elephant]
hello this is another example [cat,bird,dog,elephant]

Upvotes: 0

RobertG
RobertG

Reputation: 416

In the first step, You can try to extract text between square brackets. Code should look more readable...

foo = 'hello this is another example [ cat    bird dog elephant   ]'

# get everything between [ and ]
reg_get_between_square_brackets= re.compile(r'\[(.*)\]')
str_to_replace = reg_get_between_square_brackets.findall(foo)[0]

# replace spaces with coma
new_string = re.sub('\s+', ',', str_to_replace.strip())  # strip to remove beginning/ending white space
print(foo.replace(str_to_replace, new_string))  

Outputs:

hello this is another example [cat,bird,dog,elephant]

Upvotes: 1

Rabinzel
Rabinzel

Reputation: 7903

I didn't manage to do it with a fancy pattern, but how about this little workaround. Just write a pattern that looks for everything in between the brackets, then deal with that string seperately. Like: split it by whitespace, filter the empty elements (from leading and trailing whitespaces at start and end) and join it back together as one string seperated by a comma. That modified string you pass to re.sub and replace it with everything between the brackets.

s1 = "hello this is an example [ a    b c ]"
s2 = "hello this is another example [ cat    bird dog elephant   ]"

pattern = r"(?<=\[)(.*)(?=\])"

print(
    re.sub(
        pattern, 
        ','.join(list(filter(None, re.split(r"\s+", re.search(pattern, s1).group(1)))))
        , s1)
)

print(
    re.sub(
        pattern, 
        ','.join(list(filter(None, re.split(r"\s+", re.search(pattern, s2).group(1)))))
        , s2)
)

Output:

hello this is an example [a,b,c]
hello this is another example [cat,bird,dog,elephant]

Upvotes: 1

Related Questions