Paul Reiners
Paul Reiners

Reputation: 7894

pd.read_csv parsing incorrectly

I have a CSV file that looks like this:

Date,Time,Mood,Tags,Medications,Notes
"Jul 25, 2018",9:41 PM,8,,,"",
"Jul 26, 2018",10:05 AM,4,,,"",
"Jul 26, 2018",12:00 PM,3,,,"",
"Jul 26, 2018",7:00 PM,8,,,"",
"Jul 27, 2018",12:01 PM,8,,,"",

I run the following code:

import pandas as pd

df = pd.read_csv("./data/MoodLog_2018_09_14.csv", 
                 dtype={'Date': str, 'Time': str, 'Mood': str, 'Tags': str, 
                        'Medications': str, 'Notes': str})

print(df['Time'].head(5))

and it prints the following:

Jul 25, 2018    8
Jul 26, 2018    4
Jul 26, 2018    3
Jul 26, 2018    8
Jul 27, 2018    8
Name: Time, dtype: object

It's including the Mood column in the Time column.

Why is that?

Upvotes: 1

Views: 138

Answers (1)

ALollz
ALollz

Reputation: 59579

The issue is with your rows having a trailing ,, while the header does not. Change the header to: Date,Time,Mood,Tags,Medications,Notes,, and you will get an extra column which you can then drop.

Input: test.csv

Date,Time,Mood,Tags,Medications,Notes,
"Jul 25, 2018",9:41 PM,8,,,"",
"Jul 26, 2018",10:05 AM,4,,,"",
"Jul 26, 2018",12:00 PM,3,,,"",
"Jul 26, 2018",7:00 PM,8,,,"",
"Jul 27, 2018",12:01 PM,8,,,"",

Code:

df = pd.read_csv("test.csv", 
                 dtype={'Date': str, 'Time': str, 'Mood': str, 'Tags': str, 
                        'Medications': str, 'Notes': str}).iloc[:, :-1]

Output: df

           Date      Time Mood Tags Medications Notes
0  Jul 25, 2018   9:41 PM    8  NaN         NaN   NaN
1  Jul 26, 2018  10:05 AM    4  NaN         NaN   NaN
2  Jul 26, 2018  12:00 PM    3  NaN         NaN   NaN
3  Jul 26, 2018   7:00 PM    8  NaN         NaN   NaN
4  Jul 27, 2018  12:01 PM    8  NaN         NaN   NaN

Upvotes: 1

Related Questions