Using regex to remove the unnecessary whitespace to get the expected output

Question

I have a few unstructured data like this

test1     21;
 test2  22;
test3    [ 23 ];

and I want to remove the unnecessary whitespace and convert it into the list of two-item per row and the expected output should look like this

['test1', '21']
['test2', '22']
['test3', ['23']]

Now, I am using this regex sub method to remove the unnecessary whitespace

re.sub(r"\s+", " ", z.rstrip('
').lstrip(' ').rstrip(';')).split(' ')

Now, the problem is that it is able to replace the unnecessary whitespace into single whitespace, which is fine. But the problem I am facing in the third example, where after and before the open and close bracket respectively, it has whitespace and that I what to remove. But using the above regex I am not able to.

This is the output currently I am getting

['test1', '21']
['test2', '22']
['test3', '[', '23', ']']

You may check the example here on pythontutor.

Wiktor Stribiżew · Accepted Answer

You can use

import re
 
x = "test1     21"
y = "     test2  22"
z = "    test3    [ 23 ]"
 
for a in [x, y, z]:
    print(re.sub(r"(?


See the Python demo. Output:
['test1', '21']
['test2', '22']
['test3', '[23]']

Details:

(? - one or more whitespaces that are preceded with a [ char, whitespace or start of string

| - or
\s+(?=]) - one or more whitespaces that are followed with a ] char.

Using regex to remove the unnecessary whitespace to get the expected output

Answers (2)

Related Questions