mihagazvoda
mihagazvoda

Reputation: 1377

np.genfromtxt multiple delimiters?

My file looks like this:

1497484825;34425;-4,28,-14;-4,28,-14;-4,28,-14;-4,28,-14;-4,28,-14;-4,28,-14
1497484837;34476;-4,28,-14;-4,28,-14;-4,28,-14;-4,28,-14;-4,28,-14;-4,28,-14

I want to import it into numpy array using np.genfromtxt. The biggest problem is that it has ';' and ',' as delimiters. My try:

import numpy as np
import io

s = io.StringIO(open('2e70dfa1.csv').read().replace(';',','))

data = np.genfromtxt(s,dtype=int,delimiter=',')

I get error:

TypeError: Can't convert 'bytes' object to str implicitly

How to solve it? I'm also open to completely new (better) ideas.

Upvotes: 3

Views: 7525

Answers (2)

juanpa.arrivillaga
juanpa.arrivillaga

Reputation: 96287

According to the docs:

Parameters:
fname : file, str, pathlib.Path, list of str, generator File, filename, list, or generator to read. If the filename extension is gz or bz2, the file is first decompressed. Note that generators must return byte strings in Python 3k. The strings in a list or produced by a generator are treated as lines.

Probably easier and more efficient to give it a generator, just bearing in mind it must yield byte-strings:

>>> with open('2e70dfa1.csv', 'rb') as f:
...     clean_lines = (line.replace(b';',b',') for line in f)
...     data = np.genfromtxt(clean_lines, dtype=int, delimiter=',')
...
>>> data
array([[1497484825,      34425,         -4,         28,        -14,
                -4,         28,        -14,         -4,         28,
               -14,         -4,         28,        -14,         -4,
                28,        -14,         -4,         28,        -14],
       [1497484837,      34476,         -4,         28,        -14,
                -4,         28,        -14,         -4,         28,
               -14,         -4,         28,        -14,         -4,
                28,        -14,         -4,         28,        -14]])

Upvotes: 5

unutbu
unutbu

Reputation: 880777

Per the docs for numpy.genfromtxt:

Note that generators must return byte strings in Python 3k.

So instead of creating an StringIO object, create a BytesIO:

import numpy as np 
import io

s = io.BytesIO(open('2e70dfa1.csv', 'rb').read().replace(b';',b','))
data = np.genfromtxt(s,dtype=int,delimiter=',')

yields

array([[1497484825,      34425,         -4,         28,        -14,
                -4,         28,        -14,         -4,         28,
               -14,         -4,         28,        -14,         -4,
                28,        -14,         -4,         28,        -14],
       [1497484837,      34476,         -4,         28,        -14,
                -4,         28,        -14,         -4,         28,
               -14,         -4,         28,        -14,         -4,
                28,        -14,         -4,         28,        -14]])

Note that if you have Pandas installed, you could use pd.read_table which would allow you to specify a regex pattern as a delimiter:

import pandas as pd     
df = pd.read_table('2e70dfa1.csv', sep=';|,', engine='python', header=None)
print(df)

yields

           0      1   2   3   4   5   6   7   8   9   10  11  12  13  14  15  16  17  18  19
0  1497484825  34425  -4  28 -14  -4  28 -14  -4  28 -14  -4  28 -14  -4  28 -14  -4  28 -14
1  1497484837  34476  -4  28 -14  -4  28 -14  -4  28 -14  -4  28 -14  -4  28 -14  -4  28 -14

pd.read_table returns a DataFrame. If you need a NumPy array, you could access it through its values attribute:

In [24]: df.values
Out[24]: 
array([[1497484825,      34425,         -4,         28,        -14,
                -4,         28,        -14,         -4,         28,
               -14,         -4,         28,        -14,         -4,
                28,        -14,         -4,         28,        -14],
       [1497484837,      34476,         -4,         28,        -14,
                -4,         28,        -14,         -4,         28,
               -14,         -4,         28,        -14,         -4,
                28,        -14,         -4,         28,        -14]])

Upvotes: 1

Related Questions