Romain Jouin
Romain Jouin

Reputation: 4838

python - pandas : how to select by date

Why Can I do a selection by month in this case, but not a selection by date ?

dates = pd.date_range( start = "01/01/1931" ,  end  =  "01/02/1941" )
new_df_4 = new_df_3.reindex(dates)
new_df_4["1931-10"][![enter image description here][1]][1]

enter image description here

But this doesn't work :

new_df_4["1931-10-02"]

KeyError Traceback (most recent call last) in () ----> 1 new_df_4["1931-10-02"]

/Users/romain/anaconda/lib/python2.7/site-packages/pandas/core/frame.pyc in __getitem__(self, key)
   1990             return self._getitem_multilevel(key)
   1991         else:
-> 1992             return self._getitem_column(key)
   1993 
   1994     def _getitem_column(self, key):

/Users/romain/anaconda/lib/python2.7/site-packages/pandas/core/frame.pyc in _getitem_column(self, key)
   2002         result = self._constructor(self._data.get(key))
   2003         if result.columns.is_unique:
-> 2004             result = result[key]
   2005 
   2006         return result

/Users/romain/anaconda/lib/python2.7/site-packages/pandas/core/frame.pyc in __getitem__(self, key)
   1990             return self._getitem_multilevel(key)
   1991         else:
-> 1992             return self._getitem_column(key)
   1993 
   1994     def _getitem_column(self, key):

/Users/romain/anaconda/lib/python2.7/site-packages/pandas/core/frame.pyc in _getitem_column(self, key)
   1997         # get column
   1998         if self.columns.is_unique:
-> 1999             return self._get_item_cache(key)
   2000 
   2001         # duplicate columns & possible reduce dimensionality

/Users/romain/anaconda/lib/python2.7/site-packages/pandas/core/generic.pyc in _get_item_cache(self, item)
   1343         res = cache.get(item)
   1344         if res is None:
-> 1345             values = self._data.get(item)
   1346             res = self._box_item_values(item, values)
   1347             cache[item] = res

/Users/romain/anaconda/lib/python2.7/site-packages/pandas/core/internals.pyc in get(self, item, fastpath)
   3223 
   3224             if not isnull(item):
-> 3225                 loc = self.items.get_loc(item)
   3226             else:
   3227                 indexer = np.arange(len(self.items))[isnull(self.items)]

/Users/romain/anaconda/lib/python2.7/site-packages/pandas/indexes/base.pyc in get_loc(self, key, method, tolerance)
   1876                 return self._engine.get_loc(key)
   1877             except KeyError:
-> 1878                 return self._engine.get_loc(self._maybe_cast_indexer(key))
   1879 
   1880         indexer = self.get_indexer([key], method=method, tolerance=tolerance)

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:4027)()

pandas/index.pyx in pandas.index.IndexEngine.get_loc (pandas/index.c:3891)()

pandas/hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12408)()

pandas/hashtable.pyx in pandas.hashtable.PyObjectHashTable.get_item (pandas/hashtable.c:12359)()

KeyError: '1931-10-02'

Upvotes: 1

Views: 2080

Answers (1)

jezrael
jezrael

Reputation: 862641

For select by month use partial string indexing:

print (new_df_4["1931-10"])

This won't work if the resolutions are the same (from the same docs):

Warning However if the string is treated as an exact match, the selection in DataFrame‘s [] will be column-wise and not row-wise, see Indexing Basics. For example dft_minute['2011-12-31 23:59'] will raise KeyError as '2012-12-31 23:59' has the same resolution as index and there is no column with such name: To always have unambiguous selection, whether the row is treated as a slice or a single selection, use .loc.

In [95]: dft_minute.loc['2011-12-31 23:59']
Out[95]: 
a    1
b    4
Name: 2011-12-31 23:59:00, dtype: int64

You can use loc if need select by date:

new_df_4.loc["1931-10-02"]

Sample:

np.random.seed(10)
dates = pd.date_range( start = "01/01/1931" ,  end  =  "01/02/1941" )
new_df_4  = pd.DataFrame({'a':np.random.randint(10, size=len(dates))}, index=dates)
print (new_df_4.head())
            a
1931-01-01  9
1931-01-02  4
1931-01-03  0
1931-01-04  1
1931-01-05  9

print (new_df_4["1931-10"])
            a
1931-10-01  9
1931-10-02  6
1931-10-03  9
1931-10-04  7
1931-10-05  8
1931-10-06  0
1931-10-07  9
1931-10-08  6
1931-10-09  0
1931-10-10  1
1931-10-11  0
...

print (new_df_4.loc["1931-10-02"])
a    6
Name: 1931-10-02 00:00:00, dtype: int32

Upvotes: 4

Related Questions