Reputation: 953
I have following input trans.csv file:
Date,Currenncy,Symbol,Type,Units,UnitPrice,Cost,Tax
2012-03-14,USD,AAPL,BUY,1000
2012-05-12,USD,SBUX,SELL,500
The fields UnitPrice, Cost and Tax are optional. If they are not specified I expect NaN in the DataFrame cell.
I read the csv file with:
t = pandas.read_csv('trans.csv', parse_dates=True, index_col=0)
and got the following result:
Currenncy Symbol Type Units UnitPrice Cost Tax
Date
2012-03-14 USD AAPL BUY 1000 2012-05-12 012-05-12 12-05-12
2012-02-05 USD SBUX SELL 500 NaN NaN NaN
Why are there no NaN in the first row and is the Date repeated? Any workaround to get NaN for the unspecified fields?
Upvotes: 2
Views: 3416
Reputation: 375445
Here's a method which can handle some more cases (when there is some data in UnitCost
, Cost
, etc.).
In [1]: df = pd.read_csv('trans.csv', header=None)
In [2]: df.columns = df.ix[0]
In [3]: df[1:].set_index('Date')
Out[3]:
Currenncy Symbol Type Units UnitPrice Cost Tax
Date
2012-03-14 USD AAPL BUY 1000 NaN NaN NaN
2012-05-12 USD SBUX SELL 500 NaN NaN NaN
2012-05-12 USD SBUX SELL 500 NaN NaN NaN
It's worth noting that the dtype of the these columns will be object
.
However, I think this should be caught by to_csv
so I posted as an issue on github.
Upvotes: 2
Reputation: 7357
Your CSV file is malformed. I get the same answer as you in Pandas 0.10, and while I admit that it is indeed very, very strange, you shouldn't be feeding it malformed data.
Date,Currenncy,Symbol,Type,Units,UnitPrice,Cost,Tax
2012-03-14,USD,AAPL,BUY,1000,,,
2012-05-12,USD,SBUX,SELL,500,,,
returns the expected
>>> import pandas as pd
>>> t = pd.read_csv('pandas_test', parse_dates=True, index_col=0)
>>> t
Currenncy Symbol Type Units UnitPrice Cost Tax
Date
2012-03-14 USD AAPL BUY 1000 NaN NaN NaN
2012-05-12 USD SBUX SELL 500 NaN NaN NaN
Upvotes: 3