Reputation: 18198
I read a file which has several blank columns like this:
Raw data as text:
id stage D1 D2 D3 D4 D5 D6
1 base A
1 s1 2 2 4 5
1 s2 3 3 6 7
2 base AA
2 s1 5 3 4 3
2 s2 3 3 2 4
2 s3 2 2 3 6
3 base B
3 s1 4 4 4 5
4 base BC
I don't know the name of columns which are blank and they are a lot.
How can detect that D2 is blank (no data in this column) and then drop it?
I can iterate over columns/rows and find which columns are blank, but I think it is not the correct way of doing this in Python.
What is the correct way of doing this in Python?
Upvotes: 3
Views: 2511
Reputation: 29
Inspect your entire dataframe for NULL values
df.isnull().sum()
For getting a NULL value count of a specific column
df.isnull.sum()['D2']
To Check if the entire column is empty you can equate to the length of the dataframe
df.isnull.sum()['D2'] == len(df)
Then you can drop the desired column
df.drop('D2',axis=1,inplace=True)
Upvotes: 1
Reputation: 1604
With the keyword how
you only drop columns where all rows of that columns are empty
df = df.dropna(axis=1, how='all')
Upvotes: 4
Reputation: 323356
Try with dropna
, thresh here is require the column have one not null value.
df = df.dropna(thresh=1, aixs=1)
Upvotes: 1