Reputation: 435
I have a CSV containing data that looks like this:
<DATE> <TIME> <OPEN> <LOW> <HIGH> <CLOSE>
2001-01-03 00:00:00 0.9507 0.9505 0.9509 0.9506
....
2015-05-13 02:00:00 0.9496 0.9495 0.9509 0.9505
I want to create an index on <DATE> and <TIME> but retain the two columns as normal columns so I can reference them.
As the data is stored in a CSV I am not sure how I could parse the 2 columns (DATE and TIME) into one before creating the dataframe.
I've had a look at many answers but they seems convoluted for what I am trying to do, and I have became convinced I am missing the simple solution
Context around what lead me to this:
The proper way I've identified setting a new value (for when I am computing rolling mean values) is:
df.set_value('index', 'column', value)
Because my index is currently just on date, referencing the index for a particular row (say, the 1st row) means many values are set instead of one
Upvotes: 2
Views: 65
Reputation: 210942
UPDATE:
In [170]: df = pd.read_csv('/path/to/file.csv', parse_dates={'TIMESTAMP': ['DATE','TIME']}).set_index('TIMESTAMP')
In [171]: df
Out[171]:
OPEN LOW HIGH CLOSE
TIMESTAMP
2001-01-03 00:00:00 0.9507 0.9505 0.9509 0.9506
2001-01-03 01:00:00 0.9507 0.9505 0.9509 0.9506
2001-01-03 02:00:00 0.9507 0.9505 0.9509 0.9506
2001-01-03 03:00:00 0.9507 0.9505 0.9509 0.9506
2001-01-03 04:00:00 0.9507 0.9505 0.9509 0.9506
2001-01-04 00:00:00 0.9507 0.9505 0.9509 0.9506
2001-01-04 01:00:00 0.9507 0.9505 0.9509 0.9506
2001-01-04 02:00:00 0.9507 0.9505 0.9509 0.9506
In [172]: df.index.dtype
Out[172]: dtype('<M8[ns]')
OLD answer:
you can do it this way:
In [155]: df
Out[155]:
a b c
0 0 0 3
1 1 2 0
2 2 2 3
3 1 0 0
4 1 3 2
5 4 0 1
6 2 0 3
7 2 1 2
8 3 3 4
9 0 0 3
In [156]: df.join(df.iloc[:, :2], rsuffix='_idx').set_index((df.iloc[:, :2].columns + '_idx').tolist())
Out[156]:
a b c
a_idx b_idx
0 0 0 0 3
1 2 1 2 0
2 2 2 2 3
1 0 1 0 0
3 1 3 2
4 0 4 0 1
2 0 2 0 3
1 2 1 2
3 3 3 3 4
0 0 0 0 3
BUT, you don't really need it, because it's redundant - you still have your data in the index and can use it...
UPDATE: starting from Pandas 0.20.1 the .ix indexer is deprecated, in favor of the more strict .iloc and .loc indexers.
Upvotes: 2