Reputation: 289
I am writing some code in Python using Pandas to process a CSV file and manipulate the data in various ways within a dataframe. At present, with the script I've already written, df.head() calls up the following dataframe (simplified):
Facility Facility Servers Servers
Date 27/10/2016 28/10/2016 27/10/2016 28/10/2016
00:30 7 6 5 5
01:00 4 5 5 7
01:30 7 8 6 5
02:00 7 2 5 5
What I want to do now is to remove the columns that DON'T contain a specific date. So, in this case, if I want to remove all columns NOT containing 28/10/2016 the result would be:
Facility Servers
Date 28/10/2016 28/10/2016
00:30 6 5
01:00 5 7
01:30 8 5
02:00 2 5
I have done a lot of reading and checked various posts on StackOverflow and can't quite find the right solution. Does anyone have any idea how I could use some kind of conditional statement to remove any columns within my dataframe in which 28/10/2016 doesn't appear?
Upvotes: 1
Views: 50
Reputation: 863031
I think you need create MultiIndex.from_arrays
from first row of DataFrame
first, then remove first row by df.iloc[1:]
:
print (df.columns)
Index(['Facility', 'Facility', 'Servers', 'Servers'], dtype='object')
print (df.iloc[0])
Facility 27/10/2016
Facility 28/10/2016
Servers 27/10/2016
Servers 28/10/2016
Name: Date, dtype: object
df.columns = pd.MultiIndex.from_arrays([df.columns, pd.to_datetime(df.iloc[0])])
df = df.iloc[1:]
print (df)
Facility Servers
Date 2016-10-27 2016-10-28 2016-10-27 2016-10-28
00:30 7 6 5 5
01:00 4 5 5 7
01:30 7 8 6 5
02:00 7 2 5 5
And last use slicers:
idx = pd.IndexSlice
print (df.loc[:,idx[:,['2016-10-28']]])
Facility Servers
Date 2016-10-28 2016-10-28
00:30 6 5
01:00 5 7
01:30 8 5
02:00 2 5
Upvotes: 1