Reputation: 6191
I'm loading a CSV where decimal value separator is ,
and I would like to replace it by .
in order to proceed the analysis.
I see the converters
option in pandas.read_csv but to use it I need to provide a list of all column names (which I want to convert), which might not be a good idea since there are lots of columns.
What I have in mind is to look each cell in all columns and replace it.
ii = len(list(df))-1
print ii
jj = len(df.ix[:,0])
print jj
for i in range(0, ii):
for j in range(0, jj):
df.ix[i,j] = df.ix[i,j].to_string().replace(',' , '.')
Is there a better approach?
Upvotes: 1
Views: 371
Reputation: 33843
You can use the decimal
parameter of read_csv
:
df = pd.read_csv(file.csv, decimal=',')
Upvotes: 3
Reputation: 760
You don't have to provide all the column names to converter
.
Give only those columns you want to convert
It would be converter = {'col_name':lambda x : str(x).replace(',','.')}
EDIT after rewording question.
Is this the best way to do it?
I would say yes. OP mentioned that there are a large number of columns he/she wants to convert and feels that a dict
go out of hand. IMO, it will not. There are two reasons to why it wouldn't.
The first reason is that eventhough you have a large number of columns, I assume there is some pattern to it, (like the column numbers 2, 4... need to be converted). You could run a for
loop or a list comprehension to generate this dict
and pass it to the converter. Another advantage is that converters accept both column label as well as column index as keys so you don't have to mention the column labels.
Second, a dict
is implemented using a hash table. This ensures that worst case look-up is constant time. So you don't have to worry about slow runtimes when using a large number of elements in the dictionary.
Though your method is correct, IMO it is reinventing the wheel.
Upvotes: 0