Failing to deliminite csv-file

I have a csv-file that doesn't delimit. Screenshot of csv-file. This means that all the data stays in row[0], and does not divide into 6 columns. Does anybody know how to solve this issue?

import csv

n=1048576

id=[]*n
a=[]*n
date=[]*n
b=[]*n
c=[]*n

with open('C:\\Users\\andsc\\data_1.csv') as csv_file:
    csv_reader = csv.reader(csv_file, delimiter=',')
    line_count = 0
    for row in csv_reader:
        id[line_count] = row[0]
        a[line_count] = row[1]
        date[line_count] = row[2]
        b[line_count] = row[3]
        c[line_count] = row[4]
        
        line_count += 1

Upvotes: 0

Views: 73

Answers (2)

BoarGules
BoarGules

Reputation: 16942

Firstly, don't do this:

id=[]*n
a=[]*n
...etc...

What you are trying to do is emulate a fixed-length array. That won't work. As you will see if you do this at the command prompt:

>>> [] * 9
[]

This is because the * really is a multiply, and just as [1] * 3 gives [1, 1, 1] (three repetitions of the list [1]) doing [] * 9 gives 9 repetitions of the empty list, which is just as empty as one repetition.

Instead create empty lists:

id=[]
a=[]
...etc...

Then, in your loop, do not index into these lists, append() new values to them instead:

id.append(row[0])
a.append(row[1])
...etc...

That means you don't need to keep track of line_count, and even if you do need to do that, use the provided method csv_reader.line_num().

Using Excel screenshots to look at a CSV is often misleading. It is clear that your version of Excel expects the delimiter of the CSV to be a semicolon not a comma, which is why the data is all in one column. To be 100% sure of what is in the file, open it in a text editor like Notepad or Notepad++. That avoids Excel's aggressive type coercion, which changes anything that looks like a date, or a hexadecimal string, into a number. And above all do not save the CSV back from Excel and assume the file still to be as expected.

It is clear that the code you presented will not run. It will get an IndexError the first time through the loop. You have to fix the code before it will run, and when you do that you will see that Python really does respect the comma as delimiter.

But opening the input file in Excel has given you a mistaken idea of where the problem is. You are quite right to say that comma is clearly the intended delimiter in the file. But when you open a CSV in Excel, Excel uses your system decimal and delimiter settings, which for European installations of Windows and MacOS are usually , and ;.

Excel is not bright enough to figure out on its own that those settings are inappropriate for a given file; it needs help from you. You can change Excel's File | Open behaviour by altering your system settings, but if you change the delimiter to , you will have to change the decimal point to . (for every single application, not just Excel) and it is unlikely you would want to do that.

The workaround is to set it manually for a particular file, by importing the CSV instead of simply opening it. On the Data tab select From Text/CSV and Excel will then try to guess the settings from the first 2000 rows. If it guesses wrong you have the opportunity to fix it.

But getting Excel to display the file as you expect has nothing to do with the way Python is reading it.

Upvotes: 1

Tim Pietzcker
Tim Pietzcker

Reputation: 336108

You appear to be using a non-US version of Excel. In locales where the comma is used as a decimal separator, Excel expects the semicolon as the column delimiter:

   csv_reader = csv.reader(csv_file, delimiter=';')

Upvotes: 1

Related Questions