physicalattraction
physicalattraction

Reputation: 6858

Read csv starting with leading spaces

I have a comma-separated file (from a third party) in which each line starts and ends with a space, the fields are quoted with a doublequote, and the file ends with a line with only a space.

 "first_name";"last_name" 
 "John";"Doe" 
 "Anita";"Doe"  

I try to read this with the following code.

import csv
import json

def read_csv(filename):
    result = []
    with open(filename, 'r', encoding='utf-8') as f:
        csv_reader = csv.reader(f, delimiter=';', quotechar='"')
        for line_index, line in enumerate(csv_reader):
            if line_index == 0:
                header = line
                continue
            result.append(dict(zip(header, line)))
    return result

if __name__ == '__main__':
    contents = read_csv('test.txt')
    print(json.dumps(contents, indent=4, sort_keys=4))

This is my expected result:

[
    {
        "first_name": "John",
        "last_name ": "Doe "
    },
    {
        "first_name": "Anita",
        "last_name ": "Doe "
    }
]

However, it always takes the doublequotes as part of the first column, due to the leading spaces, plus it takes the last line also into account. This is the result I get:

[
    {
        " \"first_name\"": " \"John\"",
        "last_name ": "Doe "
    },
    {
        " \"first_name\"": " \"Anita\"",
        "last_name ": "Doe "
    },
    {
        " \"first_name\"": " "
    }
]

How can I get rid of these leading and trailing spaces before the csv is parsed? The answer here shows how to remove spaces from fields after it is read, but that wouldn't be good here, since it's not the contents of the fields that I want to change, but the fields themselves.

By the way: I am using Python 3.5.

EDIT

I am skipping empty lines now using the following code:

# Skip empty lines
line = [column.strip() for column in line]
if not any(line):
    continue

Upvotes: 0

Views: 2357

Answers (1)

Jon Clements
Jon Clements

Reputation: 142216

You can use skipinitialspace=True and use a csv.DictReader (which assumes the first row is a header and creates a dict for you of name->value instead of manually doing it yourself) instead, eg:

with open(filename) as fin:
    csvin = csv.DictReader(fin, delimiter=';', skipinitialspace=True)
    result = list(csvin)

Alternatively, if only rows with some value should be considered (ie, the last row with no values, or even iterim blanks row should be filtered out), you can use:

result = [row for row in csvin if any(row.values())]

Which'll give you:

[{'first_name': 'John', 'last_name ': 'Doe '},
 {'first_name': 'Anita', 'last_name ': 'Doe '}]

And the result of that using json.dumps(result, indent=4, sort_keys=4)) is:

[
    {
        "first_name": "John",
        "last_name ": "Doe "
    },
    {
        "first_name": "Anita",
        "last_name ": "Doe  "
    }
]

Upvotes: 3

Related Questions