Reputation: 2993
I have several data numeric files in which the decimal separator is a comma. So I use a lambda function to do a conversion:
import numpy as np
def decimal_converter(num_cols):
conv = dict((col, lambda valstr: \
float(valstr.decode('utf-8').replace(',', '.'))) for col in range(nb_cols))
return conv
data = np.genfromtxt("file.csv", converters = decimal_converter(3))
the data in the file is like this:
0; 0,28321815; 0,5819178
1; 0,56868281; 0,85621369
2; 0,24022026; 0,53490058
3; 0,63641921; 0,0293904
4; 0,65585546; 0,55913776
Here with my function decimal_converter
I need to specify the number of columns my file contains. Normally I don't need to specify numpy.genfromtxt
the number of columns in the file and it takes all it finds. I would like to keep this feature even when using converters option.
Upvotes: 3
Views: 6699
Reputation: 58865
Since genfromtxt()
accepts an iterator, you can pass the iterator applying your conversion function and then you can avoid the converters parameter:
import numpy as np
def conv(x):
return x.replace(',', '.').encode()
data = np.genfromtxt((conv(x) for x in open("test.txt")), delimiter=';')
Upvotes: 7
Reputation: 114781
Using the pandas
library might not be an option for you, but if it is, its function read_csv
has a decimal
argument that can be used to configure the decimal point character. For example,
In [36]: !cat file.ssv
0; 0,28321815; 0,5819178
1; 0,56868281; 0,85621369
2; 0,24022026; 0,53490058
3; 0,63641921; 0,0293904
4; 0,65585546; 0,55913776
In [37]: import pandas as pd
In [38]: df = pd.read_csv("file.ssv", delimiter=';', decimal=',', header=None)
In [39]: df
Out[39]:
0 1 2
0 0 0.283218 0.581918
1 1 0.568683 0.856214
2 2 0.240220 0.534901
3 3 0.636419 0.029390
4 4 0.655855 0.559138
[5 rows x 3 columns]
You then have all that pandas goodness with which to manipulate this data. Or you could convert the data frame to a numpy array:
In [51]: df.as_matrix()
Out[51]:
array([[ 0. , 0.28321815, 0.5819178 ],
[ 1. , 0.56868281, 0.85621369],
[ 2. , 0.24022026, 0.53490058],
[ 3. , 0.63641921, 0.0293904 ],
[ 4. , 0.65585546, 0.55913776]])
Upvotes: 3