abhivemp
abhivemp

Reputation: 932

Drop rows after particular year Pandas

I have a column in my dataframe that has years in the following format:

2018-19
2017-18

The years are object data type. I want to change the type of this column to datetime, then drop all rows before 1979-80. However, I tried to do that and I got formatting errors. What is the correct, or better way, of doing this?

BOS['Season'] = pd.to_datetime(BOS['Season'], format = '%Y%y')

I am quite new to Python, so I could appreciate it if you can tell me what I am doing wrong. Thanks!

Upvotes: 5

Views: 2991

Answers (2)

IMCoins
IMCoins

Reputation: 3306

I would have use .str.slice accessor of Series to select the part of the date I wish to keep, to insert it into the pd.to_datetime() function. Then, the select with .loc[] and boolean mask becomes easy.

import pandas as pd 

data = {
    'date' : ['2016-17', '2017-18', '2018-19', '2019-20']
}
df = pd.DataFrame(data)
print(df)
#       date
# 0  2016-17
# 1  2017-18
# 2  2018-19
# 3  2019-20

df['date'] = pd.to_datetime(df['date'].str.slice(0, 4), format='%Y')
print(df)
#         date
# 0 2016-01-01
# 1 2017-01-01
# 2 2018-01-01
# 3 2019-01-01


df = df.loc[ df['date'].dt.year < 2018 ]
print(df)
#           date
# 0 2016-01-01
# 1 2017-01-01

Upvotes: 1

jezrael
jezrael

Reputation: 862511

I think here is simpliest compare years separately, e.g. before -:

print (BOS)
    Season
0  1979-80
1  2018-19
2  2017-18


df = BOS[BOS['Season'].str.split('-').str[0].astype(int) < 2017]
print (df)
    Season
0  1979-80

Details:

First is splited value by Series.str.split to lists and then seelcted first lists:

print (BOS['Season'].str.split('-'))
0    [1979, 80]
1    [2018, 19]
2    [2017, 18]
Name: Season, dtype: object

print (BOS['Season'].str.split('-').str[0])
0    1979
1    2018
2    2017
Name: Season, dtype: object

Or convert both years to separately columns:

BOS['start'] = pd.to_datetime(BOS['Season'].str.split('-').str[0],  format='%Y').dt.year
BOS['end'] =  BOS['start'] + 1
print (BOS)
    Season  start   end
0  1979-80   1979  1980
1  2018-19   2018  2019
2  2017-18   2017  2018

Upvotes: 4

Related Questions