Reputation: 67
I am new to Python and I am messing around with some data that I need to have for a project.
I want to read a CSV and write a cleaner version of it to process later on.
['509,1', '22-10-2018', '05:00', '', '', '11473809', '', '', '', '', '290318']
['509,1', '22-10-2018', '15:00', '', '', '', '', '', '27076', '', '', '', '', '', '', '', '400']
The problem is that the text file sometimes has more spaces in a row and sees it as a new column.
509,1 29-08-2018 12:00 22034905 307257
509,1 29-08-2018 14:00 0 0
509,1 29-08-2018 15:00 0 0
509,1 29-08-2018 16:00 0 433
509,1 29-08-2018 17:00 433 433
How can I skip these columns?
import csv
with open('t:/509.txt', 'r') as csv_file:
csv_reader = csv.reader(csv_file, delimiter=" ")
with open('t:/509out.csv', 'w') as new_file:
csv_writer = csv.writer(new_file, delimiter=";")
for line in csv_reader:
print(line)
# csv_writer.writerow(line)
Thanks in advance
Upvotes: 4
Views: 921
Reputation: 2923
I want to read a CSV and write a cleaner version of it to process later on.
If you only want to clean and normalize whitespaces in your CSV file, you can zip whitespaces into single ones using Regular Expressions:
import re
with open('t:/509.txt', 'r') as csv_file:
text = csv_file.read()
text = re.sub(' +', ' ', text)
Output:
509,1 29-08-2018 12:00 22034905 307257
509,1 29-08-2018 14:00 0 0
509,1 29-08-2018 15:00 0 0
509,1 29-08-2018 16:00 0 433
509,1 29-08-2018 17:00 433 433
Upvotes: 0
Reputation: 3379
Simply using regular expressions:
import re
with open("t:/509.txt", 'r') as my_file:
content = my_file.read()
lines = [re.findall(r'[^ ]{1,}', x) for x in content.split("\n")]
with open("t:/509out.csv", 'w') as out_file:
for l in lines:
out_file.write(";".join(l) + "\n")
out_file.close()
my_file.close()
Upvotes: 0
Reputation: 6058
You can use the skipinitialspace parameter in csv.reader()
.
When True, whitespace immediately following the delimiter is ignored. The default is False.
csv.reader(csv_file, delimiter=" ", skipinitialspace=True)
Output:
['509,1', '29-08-2018', '12:00', '22034905', '307257']
['509,1', '29-08-2018', '14:00', '0', '0']
['509,1', '29-08-2018', '15:00', '0', '0']
['509,1', '29-08-2018', '16:00', '0', '433']
['509,1', '29-08-2018', '17:00', '433', '433']
Upvotes: 4