Reputation: 67

Reading a CSV that sometimes contain multiple whitespaces

I am new to Python and I am messing around with some data that I need to have for a project.

I want to read a CSV and write a cleaner version of it to process later on.

['509,1', '22-10-2018', '05:00', '', '', '11473809', '', '', '', '', '290318']
['509,1', '22-10-2018', '15:00', '', '', '', '', '', '27076', '', '', '', '', '', '', '', '400']

The problem is that the text file sometimes has more spaces in a row and sees it as a new column.

509,1 29-08-2018 12:00   22034905     307257
509,1 29-08-2018 14:00          0          0
509,1 29-08-2018 15:00          0          0
509,1 29-08-2018 16:00          0        433
509,1 29-08-2018 17:00        433        433

How can I skip these columns?

import csv

with open('t:/509.txt', 'r') as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=" ")

    with open('t:/509out.csv', 'w') as new_file:
        csv_writer = csv.writer(new_file, delimiter=";")

        for line in csv_reader:
            print(line)
#            csv_writer.writerow(line)

Thanks in advance

Upvotes: 4

Answers (3)

Abdulrahman Bres

Reputation: 2923

I want to read a CSV and write a cleaner version of it to process later on.

If you only want to clean and normalize whitespaces in your CSV file, you can zip whitespaces into single ones using Regular Expressions:

import re

with open('t:/509.txt', 'r') as csv_file:
    text = csv_file.read()

text = re.sub(' +', ' ', text)

Output:

509,1 29-08-2018 12:00 22034905 307257
509,1 29-08-2018 14:00 0 0
509,1 29-08-2018 15:00 0 0
509,1 29-08-2018 16:00 0 433
509,1 29-08-2018 17:00 433 433

Upvotes: 0

nikeros

Reputation: 3379

Simply using regular expressions:

import re

with open("t:/509.txt", 'r') as my_file:
    content = my_file.read()
    lines = [re.findall(r'[^ ]{1,}', x) for x in content.split("\n")]

    with open("t:/509out.csv", 'w') as out_file:
        for l in lines:
            out_file.write(";".join(l) + "\n")
    out_file.close()

my_file.close()

Upvotes: 0

Jim Wright

Reputation: 6058

You can use the skipinitialspace parameter in csv.reader().

When True, whitespace immediately following the delimiter is ignored. The default is False.

csv.reader(csv_file, delimiter=" ", skipinitialspace=True)

Output:

['509,1', '29-08-2018', '12:00', '22034905', '307257']
['509,1', '29-08-2018', '14:00', '0', '0']
['509,1', '29-08-2018', '15:00', '0', '0']
['509,1', '29-08-2018', '16:00', '0', '433']
['509,1', '29-08-2018', '17:00', '433', '433']

Upvotes: 4

Reading a CSV that sometimes contain multiple whitespaces

Answers (3)

Related Questions