numpy genfromtxt issues with .txt input

Question

I'm trying to import a txt with strings and number columns using numpy.genfromtxt function. Essentially I need an array of strings. Here is a sample txt giving me trouble:

    H2S 1.4
    C1  3.6

The txt is codified as unicode. Here's the code I'm using:

import numpy as np          
decodf= lambda x: x.decode('utf-16')
sample = np.genfromtxt(('ztest.txt'), dtype=str,
                        converters = {0:decodf, 1:decodf},
                                     delimiter='	',
                                     usecols=0)
print(sample)

Here's the output:

['H2S' 'None']

I've tried several ways to fix this issue. By putting dtype=None and eliminating the converter, I get:

[b'\xff\xfeH\x002\x00S' b'\x00g\x00\xe8\x00n']

I also tried eliminating the converter and putting dtype=str and got:

UnicodeDecodeError: 'ascii' codec can't decode byte 0xff in position 0: ordinal not in range(128)

I understand this is a troublesome function. I saw different options (eg: here) but couldn't get anyone to work.

What am I doing wrong? In the meantime, I'm looking into Pandas... Thanks in advance

numpy genfromtxt issues with .txt input

Answers (1)

Related Questions