Reputation: 487
I have a sample dataset as shown below:
| Id | Year | Price |
|----|------|-------|
| 1 | 2000 | 10 |
| 1 | 2001 | 12 |
| 1 | 2002 | 15 |
| 2 | 2000 | 16 |
| 2 | 2001 | 20 |
| 2 | 2002 | 22 |
| 3 | 2000 | 15 |
| 3 | 2001 | 19 |
| 3 | 2002 | 26 |
I want to subset the dataset so that I can consider the values only for last two years. I want to create a variable 'end_year' and pass a year value to it and then use it to subset original dataframe to take into account only the last two years. Since I have new data coming, so I wanted to create the variable. I have tried the below code but I'm getting error.
end_year="2002"
df1=df[(df['Year'] >= end_year-1)]
Upvotes: 0
Views: 337
Reputation: 41327
Per the comments, Year
is type object
in the raw data. We should first cast it to int
and then compare with numeric end_year
:
df.Year=df.Year.astype(int) # cast `Year` to `int`
end_year=2002 # now we can use `int` here too
df1=df[(df['Year'] >= end_year-1)]
Id | Year | Price | |
---|---|---|---|
1 | 1 | 2001 | 12 |
2 | 1 | 2002 | 15 |
4 | 2 | 2001 | 20 |
5 | 2 | 2002 | 22 |
7 | 3 | 2001 | 19 |
8 | 3 | 2002 | 26 |
Upvotes: 1