user1850133
user1850133

Reputation: 2993

numpy genfromtxt converters unknown number of columns

I have several data numeric files in which the decimal separator is a comma. So I use a lambda function to do a conversion:

import numpy as np
def decimal_converter(num_cols):
    conv = dict((col, lambda valstr: \
    float(valstr.decode('utf-8').replace(',', '.'))) for col in range(nb_cols))
    return conv

data = np.genfromtxt("file.csv", converters = decimal_converter(3))

the data in the file is like this:

0; 0,28321815;  0,5819178
1; 0,56868281;  0,85621369
2; 0,24022026;  0,53490058
3; 0,63641921;  0,0293904
4; 0,65585546;  0,55913776

Here with my function decimal_converter I need to specify the number of columns my file contains. Normally I don't need to specify numpy.genfromtxt the number of columns in the file and it takes all it finds. I would like to keep this feature even when using converters option.

Upvotes: 3

Views: 6699

Answers (2)

Saullo G. P. Castro
Saullo G. P. Castro

Reputation: 58865

Since genfromtxt() accepts an iterator, you can pass the iterator applying your conversion function and then you can avoid the converters parameter:

import numpy as np

def conv(x):
    return x.replace(',', '.').encode()

data = np.genfromtxt((conv(x) for x in open("test.txt")), delimiter=';')

Upvotes: 7

Warren Weckesser
Warren Weckesser

Reputation: 114781

Using the pandas library might not be an option for you, but if it is, its function read_csv has a decimal argument that can be used to configure the decimal point character. For example,

In [36]: !cat file.ssv
    0; 0,28321815;  0,5819178
    1; 0,56868281;  0,85621369
    2; 0,24022026;  0,53490058
    3; 0,63641921;  0,0293904
    4; 0,65585546;  0,55913776

In [37]: import pandas as pd

In [38]: df = pd.read_csv("file.ssv", delimiter=';', decimal=',', header=None)

In [39]: df
Out[39]: 
   0         1         2
0  0  0.283218  0.581918
1  1  0.568683  0.856214
2  2  0.240220  0.534901
3  3  0.636419  0.029390
4  4  0.655855  0.559138

[5 rows x 3 columns]

You then have all that pandas goodness with which to manipulate this data. Or you could convert the data frame to a numpy array:

In [51]: df.as_matrix()
Out[51]: 
array([[ 0.        ,  0.28321815,  0.5819178 ],
       [ 1.        ,  0.56868281,  0.85621369],
       [ 2.        ,  0.24022026,  0.53490058],
       [ 3.        ,  0.63641921,  0.0293904 ],
       [ 4.        ,  0.65585546,  0.55913776]])

Upvotes: 3

Related Questions