Replacing semicolon for comma in csv using regex in python

Question

I'm working with a .csv file and, as always, it has format problems. In this case it's a ; separated table, but there's a row that sometimes has semicolons, like this:

code;summary;sector;sub_sector
1;fishes;2;2
2;agriculture; also fishes;1;2
3;fishing. Extraction;  animals;2;2

So there are three cases:

no semicolon -> no problem
word character(non-numeric), semicolon, whitespace, word character(non-numeric)
word character(non-numeric), semicolon, 2xwhitespace, word character(non-numeric)

I turned the .csv into a .txt and then imported it as a string and then I compiled this regex:

re.compile('([^\d\W]);\s+([^\d\W])', re.S)

Which should do. I almost managed to replace those semicolons for commas, doing the following:

def replace_comma(match):
    text = match.group()
    return text.replace(';', ',')

regex = re.compile('([^\d\W]);\s+([^\d\W])', re.S)

string2 = string.split('
')

for n,i in enumerate(string2):
    if len(re.findall('([^\d\W]);(\s+)([^\d\W])', i))>=1:
        string2[n] = regex.sub(replace_comma, i)

This mostly works, but when there's two whitespaces after the semicolon, it leaves an \xa0 after the comma. I have two problems with this approach:

It's not very straightforward
Why is it leaving this \xa0 character ?

Do you know any better way to approach this?

Thanks

Edit: My desired output would be:

code;summary;sector;sub_sector
1;fishes;2;2
2;agriculture, also fishes;1;2
3;fishing. Extraction,  animals;2;2

Edit: Added explanation about turning the file into a string for better manipulation.

Andrej Kesely · Accepted Answer

For this case I wouldn't use regex, split() and rsplit() with maxpslit= parameter is enough:

data = '''1;fishes;2;2
2;agriculture; also fishes;1;2
3;fishing. Extraction;  animals;2;2'''

for line in data.splitlines():
    row = line.split(';', maxsplit=1)
    row = row[:1] + row[-1].rsplit(';', maxsplit=2)
    row[1] = row[1].replace(';', ',')
    print(';'.join(row))

Prints:

1;fishes;2;2
2;agriculture, also fishes;1;2
3;fishing. Extraction,  animals;2;2

Replacing semicolon for comma in csv using regex in python

Answers (1)

Related Questions