Reputation: 2661
I created a dataframe as :
df1 = pandas.read_csv(ifile_name, header=None, sep=r"\s+", usecols=[0,1,2,3,4],
index_col=[0,1,2], names=["year", "month", "day", "something1", "something2"])
now I would like to create another dataframe where year>2008. Hence I tried :
df2 = df1[df1.year>2008]
But getting error :
AttributeError: 'DataFrame' object has no attribute 'year'
I guess, it is not seeing the "year" among the columns because I defined it within index. But how can I get data based on year>2008 in that case?
Upvotes: 4
Views: 6858
Reputation: 323376
Assuming your index is sorted
df.loc[2008:]
Out[259]:
value
year
2010 2
2015 3
Upvotes: 3
Reputation: 164823
You are correct that year
is an index rather than a column. One solution is to use pd.DataFrame.query
, which lets you use index names directly:
df = pd.DataFrame({'year': [2005, 2010, 2015], 'value': [1, 2, 3]})
df = df.set_index('year')
res = df.query('year > 2008')
print(res)
value
year
2010 2
2015 3
Upvotes: 4
Reputation: 403128
Get the level by name using MultiIndex.get_level_values
and create a boolean mask for row selection:
df2 = df1[df1.index.get_level_values('year') > 2008]
If you plan to make modifications, create a copy of df1
so as to not operate on the view.
df2 = df1[df1.index.get_level_values('year') > 2008].copy()
Upvotes: 7