Stefano Potter
Stefano Potter

Reputation: 3577

Slicing data based on column in dataframe

I have a DataFrame like this:

Month   Day Year    TmaxF
 4       1  1912    56.00
 4       2  1912    56.00
 4       3  1912    74.00
 1       1  1913    38
 1       2  1913    28
 1       3  1913    21
 1       1  1914    30.00
 1       2  1914    31.00
 1       3  1914    20.00

and I want to only select data that is in the years 1913 and 1914. .isin isn't what I want here because this is a simplified dataset.

I am more looking for something like:

df.loc['1913':'1914'] 

but when I set Year as the index and run this code it returns the error:

TypeError: cannot do slice indexing on <class 'pandas.core.index.Int64Index'> with these indexers [1913] of <type 'str'>

df.info() returns:

Month     36397 non-null int64
Day       36397 non-null int64
Year      36397 non-null int64
TmaxF     35600 non-null float64

Upvotes: 1

Views: 2823

Answers (2)

dermen
dermen

Reputation: 5362

After you set the year as the index, use a slice

df.set_index('Year',inplace=True)
df.loc[slice('1913','1914'),:]
#     Month Day  TmaxF
#Year                 
#1913     1   1     38
#1913     1   2     28
#1913     1   3     21
#1914     1   1  30.00
#1914     1   2  31.00
#1914     1   3  20.00

Here I am using strings as the index:

df.index
#Index([u'1912', u'1912', u'1912', u'1913', u'1913', u'1913', u'1914', u'1914',
#   u'1914'],
#  dtype='object', name=u'Year')

It looks as your Year column was originally integer, so maybe your index is

df.index
#Int64Index([1912, 1912, 1912, 1913, 1913, 1913, 1914, 1914, 1914], dtype='int64', name=u'Year')

If so, then make your slicer ranges integers: df.loc[slice(1913,1914),:]

Upvotes: 0

tmrlvi
tmrlvi

Reputation: 2361

First of all, notice that you data is number (int64) and not string. From the way you tried to query the data I believe that you followed a guide with dates as indexes (in which case you can slice by the date or part of it)

Putting that aside, it is important to remember that df.loc is used to slice according to the index (which does not appear in the table you sent).

While you can set the year as index, the more elegant way to slice the data as you wish would be using "boolean indexing" :

df[(df.Year >= 1913) && (df.Year <= 1914)]

If you still insist on making the year as the index, this can be done as follows:

df.index = df.Year
df.loc[1913:1914]

Upvotes: 2

Related Questions