Reputation: 5097
I have a number of CSV files where the head looks something like:
09/07/2014,26268315,,
10/07/2014,6601181,16.3857
11/07/2014,916651,12.5879
14/07/2014,213357,,
15/07/2014,205019,10.8607
I need to read this into a dataframe and remove any rows with ,,
however when I read the CSV data into a dataframe using:
df = pd.read_csv(raw_directory+'\\'+filename, error_bad_lines=False,header=None)
I get:
0 1 2 3
0 09/07/2014 26268315 NaN NaN
1 10/07/2014 6601181 16.3857 NaN
2 11/07/2014 916651 12.5879 NaN
3 14/07/2014 213357 NaN NaN
4 15/07/2014 205019 10.8607 NaN
How can I read the CSV data into a dataframe and get:
0
0 09/07/2014,26268315,,
1 10/07/2014,6601181,16.3857
2 11/07/2014,916651,12.5879
3 14/07/2014,213357,,
4 15/07/2014,205019,10.8607
I need to remove any rows where there are ,,
present. and then resave the adjusted dataframe to a new CSV file. I was going to use:
stringList = [',,']
df = df[~df[0].isin([stringList])]
to remove the rows with ,, present so the resulting .csv head looks like:
10/07/2014,6601181,16.3857
11/07/2014,916651,12.5879
15/07/2014,205019,10.8607
Upvotes: 1
Views: 83
Reputation: 862601
I guess here is possible remove all columns with all NaN
s and then rows with any NaN
s:
df = df.dropna(axis=1, how='all').dropna()
print (df)
0 1 2
1 10/07/2014 6601181 16.3857
2 11/07/2014 916651 12.5879
4 15/07/2014 205019 10.8607
Another solution is add separator which value is not in data like |
and then filter by endswith
:
df = pd.read_csv(raw_directory+'\\'+filename, error_bad_lines=False,header=None, sep='|')
df = df[~df[0].str.endswith(',')]
#alternative solution - $ is for end of string
#df = df[~df[0].str.contains(',$')]
print (df)
0
1 10/07/2014,6601181,16.3857
2 11/07/2014,916651,12.5879
4 15/07/2014,205019,10.8607
Upvotes: 3