Reputation: 575
I have a dataset that has a flow chart and a messy date. date contains the year, month, and day, respectively (4 digits, 6 digits, 8 digits).
Name Color date
0 K A 2011
1 Y B 201411
2 B C 20151231
3 B A 2019
4 C B 201911
5 A A 20120507
6 Q G 20130601
I want to extract only the dataset for 2019 from this dataset(row). How can I do this? For example, I want the output as below
Name Color date
0 B A 2019
1 C B 201911
Upvotes: 0
Views: 34
Reputation: 1002
df[df['date'].astype('str').str.startswith('2019')]
df contains the table /data you have posted.
Upvotes: 1
Reputation: 499
It does not look like your date column is consistent, e.g. some are year, others are year and month. If year is always the first four digits, you can make the column a string, slice, and filter to the year you want, assuming your data is called 'df':
df['date'] = df['date'].astype(str)
year = df['date'].str.slice(0,4)
df[year == '2019'] # your desired rows
Upvotes: 0