Reputation: 1331
I am working with some data that I download from the web in csv format. The original data is shown as following.
Test Data
"Date","T1","T2","T3","T4","T5","T6","T7","T8"
"105/11/01","123,855","1,150,909","9.30","9.36","9.27","9.28","-0.06","60",
"105/11/02","114,385","1,062,118","9.26","9.42","9.23","9.31","+0.03","78",
"105/11/03","71,350","659,848","9.30","9.30","9.20","9.28","-0.03","42",
I use following code to read it
import pandas as pd
df = pd.read_csv("test.csv", skiprows=[0], usecols=[0,3,4,5])
I have also tried to use
import pandas as pd
df = pd.read_csv("test.csv", skiprows=[0], usecols=[0,3,4,5], keep_date_col=True)
I always get the following results
Date T3 T4 T5
105/11/01 9.30 9.36 9.27 NaN
105/11/02 9.26 9.42 9.23 NaN
105/11/03 9.30 9.30 9.20 NaN
This is what I want to get
Date T3 T4 T5
105/11/01 9.30 9.36 9.27
105/11/02 9.26 9.42 9.23
105/11/03 9.30 9.30 9.20
As you can see that pandas treat the date string not a part of the data and shift the index to one column left which cause the last column to be NaN
.
I have read the pandas document on read_csv() and found it can parse date with parse_dates
, keep_date_col
parameters, but is there any way to NOT parse date as it is doing now?
Upvotes: 2
Views: 972
Reputation: 214977
This seems to work well:
import pandas as pd
df = pd.read_csv("test.csv", skiprows=[0], usecols=[0,3,4,5], index_col=False)
df
# Date T3 T4 T5
#0 105/11/01 9.30 9.36 9.27
#1 105/11/02 9.26 9.42 9.23
#2 105/11/03 9.30 9.30 9.20
Also this from the help docs:
index_col : int or sequence or False, default None Column to use as the row labels of the DataFrame. If a sequence is given, a MultiIndex is used. If you have a malformed file with delimiters at the end of each line, you might consider index_col=False to force pandas to _not_ use the first column as the index (row names)
Upvotes: 2