Reputation: 101
I have a CSV file that contains an information about people:
name,age,height
Maria,25,172
George,45,180,
Peter,23,179,
The problem is that some strings contain an extra commas in the end, and some don't (this appears because this information was got from the internet using urlopen
in the other Python script which processes the raw data).
I tried to write some code to fix this, but I couldn`t get a result. What I've written:
import re
data = open('file.csv').read()
new_data = re.sub('\W$', '', data)
print(new_data)
But this code substitutes only the last comma in the whole document. I tried to write a cycle, which counts all lines and then analyses each line, but maybe my coding skills are not great and I didn't reach a success. Please, tell me, what I'm doing wrong.
Upvotes: 2
Views: 1331
Reputation: 5524
This is simple enough you don't really need regex
(and its probably faster to not use it)
Here's what I would do:
with open("file.csv", 'r') as f:
newLines = [line[:-1] if line.endswith(",") else line for line in f.readlines()]
Then all you need to do is write it back to the file
Upvotes: 0
Reputation: 9753
The problem is the whole file is handled as a string, and $
matches only the end of the string.
You would better use re.sub('\W\n', '\n', data)
You can also do that without regexp: new_data = data.replace(',\n', '\n')
, which is probably faster.
Upvotes: 4