Dalton Cézane
Dalton Cézane

Reputation: 3782

csv.reader returning "OrderedDict" value in a field name

I am writing a script that is responsible for reading some values from a .csv file and write them in another .csv file.

header = ["Title", "Authors", "Year", "Abstract", "Keywords"]

fields_number = int(input("Enter the number of fields you want to get: "))

field_names = list()
field_values = list()
for i in range(0, fields_number):
    field_name = input("Enter the field name: ")
    field_names.append(field_name)

try:
    with open(filename) as csvfile:
        rowsreader = csv.DictReader(csvfile)
        for row in rowsreader:
            print(row)
            json_row = '{'
            for i in range(0, len(field_names)):
                field = field_names[i]
                json_row += '"{}":"{}"'.format(header[i], row[field])
                json_row += ',' if (i < len(field_names) - 1) else '}'
            field_values.append(json.loads(json_row))
except IOError:
    print("Could not open csv file: {}.".format(filename))

I am getting the following output:

 Traceback (most recent call last):
  File "slr_helper.py", line 58, in <module>
    main()
  File "slr_helper.py", line 37, in main
    json_row += '"{}":"{}"'.format(header[i], row[field])
KeyError: 'Authors'

The beginning of the csv file has the following values:

Authors,Author Ids,Title,Year,Source title,Volume,Issue,Art. No.,Page start,Page end,Page count,Cited by,DOI,Link,Abstract,Author Keywords,Index Keywords,Sponsors,Publisher,Conference name,Conference date,Conference location,Conference code,Document Type,Access Type,Source,EID
"AlHogail A., AlShahrani M.","51060982200;57202888364;","Building consumer trust to improve Internet of Things (IoT) technology adoption",2019,

But the code is printing this when reading the csv file:

OrderedDict([('\ufeffAuthors', 'AlHogail A., AlShahrani M.'), ('Author Ids', '51060982200;57202888364;'),...

I would like to know how to avoid this OrderedDict([('\ufeff, since it is causing the error I am getting.

Upvotes: 2

Views: 958

Answers (1)

Sasha Tsukanov
Sasha Tsukanov

Reputation: 1125

As juanpa.arrivillaga pointed out, \ufeff is the byte order mark (BOM). It resides right at the beginning of the file, which is permitted for the UTF-8 format: enter image description here

By default Python 3 opens files with encoding='utf-8', which doesn't treat BOM different than other code points and reads it as if it were a piece of text contents. We need to specify encoding as 'utf-8-sig' to change that:

with open(filename, encoding='utf-8-sig') as csvfile:

By the way if you are on Linux you can use file ${filename} in the terminal, it will print the details about encoding.

Upvotes: 4

Related Questions