Reputation: 3782
I have a dataframe pd1
got with pandas
pd1 = pd.read_csv(r'c:\am\wiki_stats\topandas.txt',sep=':',
header=None, names = ['date-time','domain','requests-qty','response-bytes'],
parse_dates=[1], converters={'date-time': to_datetime}, index_col = 'date-time')
with index
>> pd1.index:
DatetimeIndex(['2016-01-01 00:00:00', '2016-01-01 00:00:00',
'2016-01-01 00:00:00', '2016-01-01 00:00:00',
'2016-01-01 00:00:00', '2016-01-01 00:00:00',
'2016-01-01 00:00:00', '2016-01-01 00:00:00',
'2016-01-01 00:00:00', '2016-01-01 00:00:00',
...
'2016-08-05 12:00:00', '2016-08-05 12:00:00',
'2016-08-05 12:00:00', '2016-08-05 12:00:00',
'2016-08-05 12:00:00', '2016-08-05 12:00:00',
'2016-08-05 12:00:00', '2016-08-05 12:00:00',
'2016-08-05 12:00:00', '2016-08-05 12:00:00'],
dtype='datetime64[ns]', name='date-time', length=6084158, freq=None)
But when I want to set index to that colomn, I get error as below (I initially wanted to set multiple columns index, that error appeared, then tried to created other dataframe from it pd_new_index = pd1.set_index(['requests-qty','domain'])
with other columns as index (ok) and to make new frame also setting index to 'date-time' column back pd_new_2 = pd_new_index.set_index(['date-time'])
- same error). 'date-time' does not look like special keyword and also that column is index now. Why error?
KeyError Traceback (most recent call last) C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance) 2656 try: -> 2657 return self._engine.get_loc(key) 2658 except KeyError:
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'date-time'
During handling of the above exception, another exception occurred:
KeyError Traceback (most recent call last) in ----> 1 pd_new_2 = pd_new_index.set_index(['date-time'])
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py in set_index(self, keys, drop, append, inplace, verify_integrity) 4176 names.append(None) 4177 else: -> 4178 level = frame[col]._values 4179 names.append(col) 4180 if drop:
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\frame.py in getitem(self, key) 2925 if self.columns.nlevels > 1: 2926 return self._getitem_multilevel(key) -> 2927 indexer = self.columns.get_loc(key) 2928 if is_integer(indexer): 2929 indexer = [indexer]
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance) 2657
return self._engine.get_loc(key) 2658 except KeyError: -> 2659 return self._engine.get_loc(self._maybe_cast_indexer(key)) 2660
indexer = self.get_indexer([key], method=method, tolerance=tolerance) 2661 if indexer.ndim > 1 or indexer.size > 1:pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
pandas/_libs/hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.get_item()
KeyError: 'date-time'
Upvotes: 2
Views: 21304
Reputation: 862641
Reason is date-time
is already index, here DatetimeIndex
, so not possible select it like columns by names.
Reason is parameter index_col
:
pd1 = pd.read_csv(r'c:\am\wiki_stats\topandas.txt',
sep=':',
header=None,
names = ['date-time','domain','requests-qty','response-bytes'],
parse_dates=[1],
converters={'date-time': to_datetime},
index_col = 'date-time')
For MultiIndex add list of columns names in index_col
, remove converters
and specify column name in parse_dates
parameter:
import pandas as pd
from io import StringIO
temp=u"""2016-01-01:d1:0:0
2016-01-02:d2:0:1
2016-01-03:d3:1:0"""
#after testing replace 'pd.compat.StringIO(temp)' to r'c:\am\wiki_stats\topandas.txt''
df = pd.read_csv(StringIO(temp),
sep=':',
header=None,
names = ['date-time','domain','requests-qty','response-bytes'],
parse_dates=['date-time'],
index_col = ['date-time','domain'])
print (df)
date-time domain
2016-01-01 d1 0 0
2016-01-02 d2 0 1
2016-01-03 d3 1 0
print (df.index)
MultiIndex([('2016-01-01', 'd1'),
('2016-01-02', 'd2'),
('2016-01-03', 'd3')],
names=['date-time', 'domain'])
EDIT1: Solution with append
parameter in set_index
:
import pandas as pd
from io import StringIO
temp=u"""2016-01-01:d1:0:0
2016-01-02:d2:0:1
2016-01-03:d3:1:0"""
#after testing replace 'pd.compat.StringIO(temp)' to r'c:\am\wiki_stats\topandas.txt''
df = pd.read_csv(StringIO(temp),
sep=':',
header=None,
names = ['date-time','domain','requests-qty','response-bytes'],
parse_dates=['date-time'],
index_col = 'date-time')
print (df)
domain requests-qty response-bytes
date-time
2016-01-01 d1 0 0
2016-01-02 d2 0 1
2016-01-03 d3 1 0
print (df.index)
DatetimeIndex(['2016-01-01', '2016-01-02', '2016-01-03'],
dtype='datetime64[ns]', name='date-time', freq=None)
df1 = df.set_index(['domain'], append = True)
print (df1)
requests-qty response-bytes
date-time domain
2016-01-01 d1 0 0
2016-01-02 d2 0 1
2016-01-03 d3 1 0
print (df1.index)
MultiIndex([('2016-01-01', 'd1'),
('2016-01-02', 'd2'),
('2016-01-03', 'd3')],
names=['date-time', 'domain'])
Upvotes: 1