pottolom
pottolom

Reputation: 289

Python/Pandas drop column based on non-appearance of value in column

I am writing some code in Python using Pandas to process a CSV file and manipulate the data in various ways within a dataframe. At present, with the script I've already written, df.head() calls up the following dataframe (simplified):

        Facility    Facility    Servers     Servers
Date    27/10/2016  28/10/2016  27/10/2016  28/10/2016
00:30   7           6           5           5
01:00   4           5           5           7
01:30   7           8           6           5
02:00   7           2           5           5

What I want to do now is to remove the columns that DON'T contain a specific date. So, in this case, if I want to remove all columns NOT containing 28/10/2016 the result would be:

        Facility    Servers     
Date    28/10/2016  28/10/2016
00:30   6           5
01:00   5           7
01:30   8           5
02:00   2           5

I have done a lot of reading and checked various posts on StackOverflow and can't quite find the right solution. Does anyone have any idea how I could use some kind of conditional statement to remove any columns within my dataframe in which 28/10/2016 doesn't appear?

Upvotes: 1

Views: 50

Answers (1)

jezrael
jezrael

Reputation: 863031

I think you need create MultiIndex.from_arrays from first row of DataFrame first, then remove first row by df.iloc[1:]:

print (df.columns)
Index(['Facility', 'Facility', 'Servers', 'Servers'], dtype='object')

print (df.iloc[0])
Facility    27/10/2016
Facility    28/10/2016
Servers     27/10/2016
Servers     28/10/2016
Name: Date, dtype: object

df.columns = pd.MultiIndex.from_arrays([df.columns, pd.to_datetime(df.iloc[0])])
df = df.iloc[1:]
print (df)
        Facility               Servers           
Date  2016-10-27 2016-10-28 2016-10-27 2016-10-28
00:30          7          6          5          5
01:00          4          5          5          7
01:30          7          8          6          5
02:00          7          2          5          5

And last use slicers:

idx = pd.IndexSlice
print (df.loc[:,idx[:,['2016-10-28']]])
        Facility    Servers
Date  2016-10-28 2016-10-28
00:30          6          5
01:00          5          7
01:30          8          5
02:00          2          5

Upvotes: 1

Related Questions