lqope54
lqope54

Reputation: 93

Why all , are not converted to decimals when importing in Pandas?

I am using the following code to import the CSV file. It works well except for when it encounters a three digit number followed by a decimal. Below is my code and the result

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

def fft(x, Plot_ShareY=True): 
    dfs = pd.read_csv(x, delimiter=";", skiprows=(1,2), decimal=",", na_values='NaN') #loads the csv files       
    
    #replaces non-numeric symbols to NaN. 
    dfs = dfs.replace({'-∞': np.nan, '∞': np.nan})
    #print(dfs) #before dropping NaNs
    
    #each column taken into a separate variable
    time = dfs['Time'] #- np.min(dfs['Time']) 
    channelA = dfs['Channel A']
    channelB = dfs['Channel B'] 
    channelC = dfs['Channel C'] 
    channelD = dfs['Channel D']   
    channels = [channelA, channelB, channelC, channelD]   
    
    #printing the smallest index number which is NaN
    ind_num_A = np.where(channelA.isna())[0][0]
    ind_num_B = np.where(channelB.isna())[0][0]
    ind_num_C = np.where(channelC.isna())[0][0]
    ind_num_D = np.where(channelD.isna())[0][0]
    
    ind_num = [ind_num_A, ind_num_B, ind_num_C, ind_num_D]
    
    #dropping all rows after the first NaN is found
    rem_ind = np.amin(ind_num)  #finds the array-wise minimum
    #print('smallest index to be deleted is: ' +str(rem_ind))
    dfs = dfs.drop(dfs.index[rem_ind:])
    print(dfs) #after dropping NaNs

Result

The result is as I want except for the last five rows in Channel B and C, where a comma is seen instead of a point to indicate decimal. I don't know why it works everywhere else but not for a few rows. The CSV file can be found here.

Upvotes: 1

Views: 1169

Answers (2)

BdR
BdR

Reputation: 3048

I think you need to replace the non-numeric symbols -∞ and as NaN already while reading, and not after the fact. If you do it after the data frame is created, then the values have been read in and it's parsed as data type str intead of float. This messes up the data types of the column.

So instead of na_values='NaN' do this na_values=["-∞", "∞"], so the code is like this:

dfs = pd.read_csv(x, delimiter=";", skiprows=(1,2), decimal=",", na_values=["-∞", "∞"])

#replaces non-numeric symbols to NaN. 
# dfs = dfs.replace({'-∞': np.nan, '∞': np.nan}) # not needed anymore

Upvotes: 1

Garrett Johnson
Garrett Johnson

Reputation: 176

It looks like a data type issue. Some of the values are strings so pandas will not automatically convert to float before replacing ',' with '.'.

one option is to convert each column after you read the file with something like: df['colname'] = df['colname'].str.replace(',', '.').astype(float)

Upvotes: 2

Related Questions