Reputation: 3782
I am writing a script that is responsible for reading some values from a .csv file and write them in another .csv file.
header = ["Title", "Authors", "Year", "Abstract", "Keywords"]
fields_number = int(input("Enter the number of fields you want to get: "))
field_names = list()
field_values = list()
for i in range(0, fields_number):
field_name = input("Enter the field name: ")
field_names.append(field_name)
try:
with open(filename) as csvfile:
rowsreader = csv.DictReader(csvfile)
for row in rowsreader:
print(row)
json_row = '{'
for i in range(0, len(field_names)):
field = field_names[i]
json_row += '"{}":"{}"'.format(header[i], row[field])
json_row += ',' if (i < len(field_names) - 1) else '}'
field_values.append(json.loads(json_row))
except IOError:
print("Could not open csv file: {}.".format(filename))
I am getting the following output:
Traceback (most recent call last):
File "slr_helper.py", line 58, in <module>
main()
File "slr_helper.py", line 37, in main
json_row += '"{}":"{}"'.format(header[i], row[field])
KeyError: 'Authors'
The beginning of the csv file has the following values:
Authors,Author Ids,Title,Year,Source title,Volume,Issue,Art. No.,Page start,Page end,Page count,Cited by,DOI,Link,Abstract,Author Keywords,Index Keywords,Sponsors,Publisher,Conference name,Conference date,Conference location,Conference code,Document Type,Access Type,Source,EID
"AlHogail A., AlShahrani M.","51060982200;57202888364;","Building consumer trust to improve Internet of Things (IoT) technology adoption",2019,
But the code is printing this when reading the csv file:
OrderedDict([('\ufeffAuthors', 'AlHogail A., AlShahrani M.'), ('Author Ids', '51060982200;57202888364;'),...
I would like to know how to avoid this OrderedDict([('\ufeff
, since it is causing the error I am getting.
Upvotes: 2
Views: 958
Reputation: 1125
As juanpa.arrivillaga pointed out, \ufeff
is the byte order mark (BOM). It resides right at the beginning of the file, which is permitted for the UTF-8 format:
By default Python 3 opens files with encoding='utf-8'
, which doesn't treat BOM different than other code points and reads it as if it were a piece of text contents. We need to specify encoding as 'utf-8-sig'
to change that:
with open(filename, encoding='utf-8-sig') as csvfile:
By the way if you are on Linux you can use file ${filename}
in the terminal, it will print the details about encoding.
Upvotes: 4