Reputation: 13
I am trying to use numpy's genfromtxt to read csv's of bond lengths and energies into arrays (to use to generate a potential energy surface and reaction path, so I'll be using scipy.interpolate--hence the need for every value...).
The problem is that genfromtxt is reading the first value of every csv input as NaN. How do I fix this?
As an example, I have the following data in dcm_oh_lengths.csv:
1.0763,1.1263,1.1763,1.2263,1.2763,1.3263,1.3763,1.4263,1.4763,1.5263,1.5763
And I call it with
oh_all = np.genfromtxt(solv+'_oh_lengths.csv',dtype=float,delimiter=',')
And oh_all returns
array([ nan, 1.1263, 1.1763, 1.2263, 1.2763, 1.3263, 1.3763, 1.4263,
1.4763, 1.5263, 1.5763])
So the first datapoint is read as missing. If I change the data to
,1.0763,1.1263,1.1763,1.2263,1.2763,1.3263,1.3763,1.4263,1.4763,1.5263,1.5763
Then doing the same thing returns
array([ nan, 1.0763, 1.1263, 1.1763, 1.2263, 1.2763, 1.3263, 1.3763,
1.4263, 1.4763, 1.5263, 1.5763])
As a longer example, the first few lines of the energies (dcm_energies.csv) is:
-7162979.201,-7163010.482,-7163033.634,-7163043.279,-7163060.113,-7163068.894,-7163076.255,-7163078.541,-7163080.908,-7163056.179,-7163081.743
-7163005.74,-7163031.808,-7163050.794,-7163056.603,-7163064.619,-7163070.65,-7163080.606,-7163080.682,-7163081.125,-7163052.444,-7163078.824
-7163024.746,-7163046.199,-7163061.278,-7163063.603,-7163068.336,-7163071.692,-7163079.11,-7163077.25,-7163075.861,-7163043.325,-7163070.561 (...)
And calling it through genfromtxt as above gives:
array([[ nan, -7163010.482, -7163033.634, -7163043.279,
-7163060.113, -7163068.894, -7163076.255, -7163078.541,
-7163080.908, -7163056.179, -7163081.743],
[-7163005.74 , -7163031.808, -7163050.794, -7163056.603,
-7163064.619, -7163070.65 , -7163080.606, -7163080.682,
-7163081.125, -7163052.444, -7163078.824],
[-7163024.746, -7163046.199, -7163061.278, -7163063.603,
-7163068.336, -7163071.692, -7163079.11 , -7163077.25 ,
-7163075.861, -7163043.325, -7163070.561], (...)
Upvotes: 1
Views: 3793
Reputation: 1
As Warren has pointed out, it is a BOM Issue.
A possibly easier solution I found online is to open your CSV file in notepad++. You can see on the bottom right if you have a UTF-8 BOM file.
If you do, you can just click on encoding and select UTF-8, and save your file. This way eliminates the need to add further code.
Upvotes: 0
Reputation: 114811
My guess is that the file begins with a "byte order mark" (BOM). How was the file created?
Try this:
with open('dcm_oh_lengths.csv', 'r', encoding='utf-8-sig') as f:
oh_all = np.genfromtxt(f, dtype=float, delimiter=',')
Upvotes: 5