Reputation: 93
I am using the following code to import the CSV file. It works well except for when it encounters a three digit number followed by a decimal. Below is my code and the result
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
def fft(x, Plot_ShareY=True):
dfs = pd.read_csv(x, delimiter=";", skiprows=(1,2), decimal=",", na_values='NaN') #loads the csv files
#replaces non-numeric symbols to NaN.
dfs = dfs.replace({'-∞': np.nan, '∞': np.nan})
#print(dfs) #before dropping NaNs
#each column taken into a separate variable
time = dfs['Time'] #- np.min(dfs['Time'])
channelA = dfs['Channel A']
channelB = dfs['Channel B']
channelC = dfs['Channel C']
channelD = dfs['Channel D']
channels = [channelA, channelB, channelC, channelD]
#printing the smallest index number which is NaN
ind_num_A = np.where(channelA.isna())[0][0]
ind_num_B = np.where(channelB.isna())[0][0]
ind_num_C = np.where(channelC.isna())[0][0]
ind_num_D = np.where(channelD.isna())[0][0]
ind_num = [ind_num_A, ind_num_B, ind_num_C, ind_num_D]
#dropping all rows after the first NaN is found
rem_ind = np.amin(ind_num) #finds the array-wise minimum
#print('smallest index to be deleted is: ' +str(rem_ind))
dfs = dfs.drop(dfs.index[rem_ind:])
print(dfs) #after dropping NaNs
The result is as I want except for the last five rows in Channel B and C, where a comma is seen instead of a point to indicate decimal. I don't know why it works everywhere else but not for a few rows. The CSV file can be found here.
Upvotes: 1
Views: 1169
Reputation: 3048
I think you need to replace the non-numeric symbols -∞
and ∞
as NaN
already while reading, and not after the fact. If you do it after the data frame is created, then the values have been read in and it's parsed as data type str
intead of float
. This messes up the data types of the column.
So instead of na_values='NaN'
do this na_values=["-∞", "∞"]
, so the code is like this:
dfs = pd.read_csv(x, delimiter=";", skiprows=(1,2), decimal=",", na_values=["-∞", "∞"])
#replaces non-numeric symbols to NaN.
# dfs = dfs.replace({'-∞': np.nan, '∞': np.nan}) # not needed anymore
Upvotes: 1
Reputation: 176
It looks like a data type issue. Some of the values are strings so pandas will not automatically convert to float before replacing ',' with '.'.
one option is to convert each column after you read the file with something like: df['colname'] = df['colname'].str.replace(',', '.').astype(float)
Upvotes: 2