SSC
SSC

Reputation: 1331

How to call pandas read_csv() without it parsing date string

I am working with some data that I download from the web in csv format. The original data is shown as following.

Test Data
"Date","T1","T2","T3","T4","T5","T6","T7","T8"
"105/11/01","123,855","1,150,909","9.30","9.36","9.27","9.28","-0.06","60",
"105/11/02","114,385","1,062,118","9.26","9.42","9.23","9.31","+0.03","78",
"105/11/03","71,350","659,848","9.30","9.30","9.20","9.28","-0.03","42",

I use following code to read it

import pandas as pd
df = pd.read_csv("test.csv", skiprows=[0], usecols=[0,3,4,5])

I have also tried to use

import pandas as pd
df = pd.read_csv("test.csv", skiprows=[0], usecols=[0,3,4,5], keep_date_col=True)

I always get the following results

           Date    T3    T4   T5
105/11/01  9.30  9.36  9.27  NaN
105/11/02  9.26  9.42  9.23  NaN
105/11/03  9.30  9.30  9.20  NaN

This is what I want to get

     Date    T3    T4    T5
105/11/01  9.30  9.36  9.27
105/11/02  9.26  9.42  9.23
105/11/03  9.30  9.30  9.20

As you can see that pandas treat the date string not a part of the data and shift the index to one column left which cause the last column to be NaN.

I have read the pandas document on read_csv() and found it can parse date with parse_dates, keep_date_col parameters, but is there any way to NOT parse date as it is doing now?

Upvotes: 2

Views: 972

Answers (1)

akuiper
akuiper

Reputation: 214977

This seems to work well:

import pandas as pd
df = pd.read_csv("test.csv", skiprows=[0], usecols=[0,3,4,5], index_col=False)

df
#        Date     T3      T4      T5
#0  105/11/01   9.30    9.36    9.27
#1  105/11/02   9.26    9.42    9.23
#2  105/11/03   9.30    9.30    9.20

Also this from the help docs:

index_col : int or sequence or False, default None
    Column to use as the row labels of the DataFrame. If a sequence is given, a
    MultiIndex is used. If you have a malformed file with delimiters at the end
    of each line, you might consider index_col=False to force pandas to _not_
    use the first column as the index (row names)

Upvotes: 2

Related Questions