Reputation: 786
I have a pandas dataframe with columns as shown:
data = {"col1":[1, 2, 3], "col2":[1, 2, 3],"2020-10-19":[1, 2, 3], "2018-10-19":[4, 5, 6], "2016-10-19":[7, 8, 9]}
df = pd.DataFrame(data)
allcolumns= list(df.columns)
Now I want to go through the allcolumns variable and remove any element that is not in the format of date shown above. So my allcolumns should have elements ["2020-10-19", "2018-10-19", "2016-10-19"]. col1 and col2 should be deleted. Note:- I am very new to pandas, and still trying to learn.
Upvotes: 0
Views: 92
Reputation: 34086
You can use pd.to_datetime
:
In [4550]: x = pd.to_datetime(allcolumns, errors='coerce')
In [4551]: x
Out[4551]: DatetimeIndex(['NaT', 'NaT', '2020-10-19', '2018-10-19', '2016-10-19'], dtype='datetime64[ns]', freq=None)
You can remove NaT
like:
In [4563]: cols = [i for i in pd.to_datetime(allcolumns, errors='coerce') if not pd.isnull(i)]
In [4564]: cols
Out[4564]:
[Timestamp('2020-10-19 00:00:00'),
Timestamp('2018-10-19 00:00:00'),
Timestamp('2016-10-19 00:00:00')]
To get only max
value, you can do:
In [4550]: x = pd.to_datetime(allcolumns, errors='coerce')
In [4568]: pd.Series(x).max()
Out[4568]: Timestamp('2020-10-19 00:00:00')
Upvotes: 1