How to remove missing data and 0s whilst keeping the dataframe the same shape using Pandas?

Question

I have a dataframe and I want to reformat it in order it remove the instances of whether a missing value or a zero occurs before the first non-zero value appears across a row. However I do not want to delete any rows or columns and I do not want to remove any 0s or missing values which appear after the non-zeroes.

Below is the dataframe I am working with:

> data =[['Adam',2.55,4.53,3.45,2.12,3.14],['Bill',np.NaN,2.14,3.65,4.12],['Chris',np.NaN,0,2.82,0,6.04],['David',np.NaN,0,7.42,3.52]]

> df = pd.DataFrame(data, columns = ['Name', 'A','B','C','D','E'])

Moreover, here is the expected outcome:

> data1 =[['Adam',2.55,4.53,3.45,2.12,3.14],['Bill',2.14,3.65,4.12],['Chris',2.82,0,6.04],['David',7.42,3.52]]

> df1 = pd.DataFrame(data1, columns = ['Name', 'A','B','C','D','E'])

anky · Accepted Answer

This is not a trivial problem. Here is the solution:

m=df.set_index('Name')
m=m[m.isin(m.mask(m.le(0)).bfill(axis=1).iloc[:,0]).cumsum(axis=1).astype(bool)]
print(m)

         A     B     C     D     E
Name                               
Adam   2.55  4.53  3.45  2.12  3.14
Bill    NaN  2.14  3.65  4.12   NaN
Chris   NaN   NaN  2.82  0.00  6.04
David   NaN   NaN  7.42  3.52   NaN

Then using justify:

pd.DataFrame(justify(m.values,np.nan),columns=m.columns,index=m.index).reset_index()

    Name     A     B     C     D     E
0   Adam  2.55  4.53  3.45  2.12  3.14
1   Bill  2.14  3.65  4.12   NaN   NaN
2  Chris  2.82  0.00  6.04   NaN   NaN
3  David  7.42  3.52   NaN   NaN   NaN

Explanation:

Step1: Set the Name column as index so we can deal with numeric values only. Step2: m.mask(m.le(0)).bfill(axis=1).iloc[:,0] gives the first value which is greater than 0. Step3: Then using isin() to return True wherever the value appears in each row. Step4: cumsum(axis=1).astype(bool) makes all the remaining elements as True so we can filter only those values, other values becomes NaN. Then use justify function from the linked post.

How to remove missing data and 0s whilst keeping the dataframe the same shape using Pandas?

Answers (1)

Related Questions