Reputation: 1377
My file looks like this:
1497484825;34425;-4,28,-14;-4,28,-14;-4,28,-14;-4,28,-14;-4,28,-14;-4,28,-14
1497484837;34476;-4,28,-14;-4,28,-14;-4,28,-14;-4,28,-14;-4,28,-14;-4,28,-14
I want to import it into numpy array using np.genfromtxt. The biggest problem is that it has ';' and ',' as delimiters. My try:
import numpy as np
import io
s = io.StringIO(open('2e70dfa1.csv').read().replace(';',','))
data = np.genfromtxt(s,dtype=int,delimiter=',')
I get error:
TypeError: Can't convert 'bytes' object to str implicitly
How to solve it? I'm also open to completely new (better) ideas.
Upvotes: 3
Views: 7525
Reputation: 96287
According to the docs:
Parameters:
fname : file, str, pathlib.Path, list of str, generator File, filename, list, or generator to read. If the filename extension is gz or bz2, the file is first decompressed. Note that generators must return byte strings in Python 3k. The strings in a list or produced by a generator are treated as lines.
Probably easier and more efficient to give it a generator, just bearing in mind it must yield byte-strings:
>>> with open('2e70dfa1.csv', 'rb') as f:
... clean_lines = (line.replace(b';',b',') for line in f)
... data = np.genfromtxt(clean_lines, dtype=int, delimiter=',')
...
>>> data
array([[1497484825, 34425, -4, 28, -14,
-4, 28, -14, -4, 28,
-14, -4, 28, -14, -4,
28, -14, -4, 28, -14],
[1497484837, 34476, -4, 28, -14,
-4, 28, -14, -4, 28,
-14, -4, 28, -14, -4,
28, -14, -4, 28, -14]])
Upvotes: 5
Reputation: 880777
Per the docs for numpy.genfromtxt:
Note that generators must return byte strings in Python 3k.
So instead of creating an StringIO
object, create a BytesIO
:
import numpy as np
import io
s = io.BytesIO(open('2e70dfa1.csv', 'rb').read().replace(b';',b','))
data = np.genfromtxt(s,dtype=int,delimiter=',')
yields
array([[1497484825, 34425, -4, 28, -14,
-4, 28, -14, -4, 28,
-14, -4, 28, -14, -4,
28, -14, -4, 28, -14],
[1497484837, 34476, -4, 28, -14,
-4, 28, -14, -4, 28,
-14, -4, 28, -14, -4,
28, -14, -4, 28, -14]])
Note that if you have Pandas installed, you could use pd.read_table
which would allow you to specify a regex pattern as a delimiter:
import pandas as pd
df = pd.read_table('2e70dfa1.csv', sep=';|,', engine='python', header=None)
print(df)
yields
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19
0 1497484825 34425 -4 28 -14 -4 28 -14 -4 28 -14 -4 28 -14 -4 28 -14 -4 28 -14
1 1497484837 34476 -4 28 -14 -4 28 -14 -4 28 -14 -4 28 -14 -4 28 -14 -4 28 -14
pd.read_table
returns a DataFrame. If you need a NumPy array, you could access it through its values
attribute:
In [24]: df.values
Out[24]:
array([[1497484825, 34425, -4, 28, -14,
-4, 28, -14, -4, 28,
-14, -4, 28, -14, -4,
28, -14, -4, 28, -14],
[1497484837, 34476, -4, 28, -14,
-4, 28, -14, -4, 28,
-14, -4, 28, -14, -4,
28, -14, -4, 28, -14]])
Upvotes: 1