Reputation: 1323
I have a sorted CSV dataset with four columns that I want to use as a MultiIndex, including two DateTime columns:
Alex,Beta,2011-03-01 00:00:00,2011-03-03 00:00:00,A,8,11.4
Alex,Beta,2011-03-03 00:00:00,2011-03-05 00:00:00,B,10,17.2
Alex,Beta,2011-03-05 00:00:00,2011-03-07 00:00:00,A,3,11.4
Alex,Beta,2011-03-07 00:00:00,2011-03-09 00:00:00,B,7,17.2
Alex,Orion,2011-03-02 00:00:00,2011-03-04 00:00:00,A,4,11.4
Alex,Orion,2011-03-03 00:00:00,2011-03-05 00:00:00,B,6,17.2
Alex,Orion,2011-03-04 00:00:00,2011-03-06 00:00:00,A,3,11.4
Alex,Orion,2011-03-05 00:00:00,2011-03-07 00:00:00,B,11,17.2
Alex,ZZYZX,2011-03-02 00:00:00,2011-03-05 00:00:00,A,10,11.4
Alex,ZZYZX,2011-03-04 00:00:00,2011-03-07 00:00:00,A,15,11.4
Alex,ZZYZX,2011-03-06 00:00:00,2011-03-09 00:00:00,B,20,17.2
Alex,ZZYZX,2011-03-08 00:00:00,2011-03-11 00:00:00,B,5,17.2
I can load this with read_csv and display the DataFrame hierarchically. But indexing it is another matter. The nearest I can tell is that pandas doesn't like using DateTime indexes here. If I comment out the DateTime labels in index_col as well as the corresponding entries in the indexing statement (df.loc), it works fine.
Any ideas?
#!/usr/bin/env python
import numpy as np
import pandas as pd
pd.set_option('display.height', 400)
pd.set_option('display.width', 400)
pd.set_option('display.max_rows', 1000)
pd.set_option('display.max_columns', 30)
pd.set_option('display.line_width', 200)
try:
df = pd.read_csv(
'./sales.csv',
header = None,
na_values = ['NULL'],
names = [
'salesperson',
'customer',
'invoice_date',
'ship_date',
'product',
'quantity',
'price',
],
index_col = [
'salesperson',
'customer',
'invoice_date',
'ship_date',
],
parse_dates = [
'invoice_date',
'ship_date',
],
)
except Exception as e:
print(e)
try:
print(df)
print(df.loc[(
'Alex', # salesperson
'ZZYZX', # customer
'2011-03-02 00:00:00', # invoice_date
'2011-03-05 00:00:00', # ship_date
)])
except Exception as e:
print(e)
Upvotes: 1
Views: 718
Reputation: 64463
It seems to work fine, im getting a proper df. Although i would try avoiding the empty entries in every list.
If you use parse_dates
you should also access those columns with a proper datetime
object:
df.loc[('Alex','ZZYZX',pd.datetime(2011,3,2),pd.datetime(2011,3,5))]
product A
quantity 10
price 11.4
Name: (Alex, ZZYZX, 2011-03-02 00:00:00, 2011-03-05 00:00:00), dtype: object
Upvotes: 1