TanuAD
TanuAD

Reputation: 786

Python - How to get columns of the DataFrame and then filter it to keep only certain values

I have a pandas dataframe with columns as shown:

data = {"col1":[1, 2, 3], "col2":[1, 2, 3],"2020-10-19":[1, 2, 3], "2018-10-19":[4, 5, 6], "2016-10-19":[7, 8, 9]}

df = pd.DataFrame(data)
allcolumns= list(df.columns)

Now I want to go through the allcolumns variable and remove any element that is not in the format of date shown above. So my allcolumns should have elements ["2020-10-19", "2018-10-19", "2016-10-19"]. col1 and col2 should be deleted. Note:- I am very new to pandas, and still trying to learn.

Upvotes: 0

Views: 92

Answers (1)

Mayank Porwal
Mayank Porwal

Reputation: 34086

You can use pd.to_datetime:

In [4550]: x = pd.to_datetime(allcolumns, errors='coerce')

In [4551]: x
Out[4551]: DatetimeIndex(['NaT', 'NaT', '2020-10-19', '2018-10-19', '2016-10-19'], dtype='datetime64[ns]', freq=None)

You can remove NaT like:

In [4563]: cols = [i for i in pd.to_datetime(allcolumns, errors='coerce') if not pd.isnull(i)]

In [4564]: cols
Out[4564]: 
[Timestamp('2020-10-19 00:00:00'),
 Timestamp('2018-10-19 00:00:00'),
 Timestamp('2016-10-19 00:00:00')]

To get only max value, you can do:

In [4550]: x = pd.to_datetime(allcolumns, errors='coerce')
In [4568]: pd.Series(x).max()
Out[4568]: Timestamp('2020-10-19 00:00:00')

Upvotes: 1

Related Questions