Tarifazo
Tarifazo

Reputation: 4343

numpy genfromtxt issues with .txt input

I'm trying to import a txt with strings and number columns using numpy.genfromtxt function. Essentially I need an array of strings. Here is a sample txt giving me trouble:

    H2S 1.4
    C1  3.6

The txt is codified as unicode. Here's the code I'm using:

import numpy as np          
decodf= lambda x: x.decode('utf-16')
sample = np.genfromtxt(('ztest.txt'), dtype=str,
                        converters = {0:decodf, 1:decodf},
                                     delimiter='\t',
                                     usecols=0)
print(sample)

Here's the output:

['H2S' 'None']

I've tried several ways to fix this issue. By putting dtype=None and eliminating the converter, I get:

[b'\xff\xfeH\x002\x00S' b'\x00g\x00\xe8\x00n']

I also tried eliminating the converter and putting dtype=str and got:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 0: ordinal not in range(128)

I understand this is a troublesome function. I saw different options (eg: here) but couldn't get anyone to work.

What am I doing wrong? In the meantime, I'm looking into Pandas... Thanks in advance

Upvotes: 0

Views: 1874

Answers (1)

Warren Weckesser
Warren Weckesser

Reputation: 114956

Your file is encoded as UTF-16, and the first two characters are the BOM.

Try this (with python 2.7):

import io
import numpy as np

with io.open('ztest.txt', 'r', encoding='UTF-16') as f:
    data = np.genfromtxt(f, delimiter='\t', dtype=None, usecols=[0])  # or dtype=str

genfromtxt has some issues when run in python 3 with Unicode files. As a work-around, you could simply encode the lines before before passing them to genfromtxt. For example, the following encodes each line as latin-1 before passing the lines to genfromtxt:

import io
import numpy as np

with io.open('ztest.txt', 'r', encoding='UTF-16') as f:
    lines = [line.encode('latin-1') for line in f]
    data = np.genfromtxt(lines, delimiter='\t', dtype=None, usecols=[0])

Upvotes: 1

Related Questions