Reputation: 143
I have a dataframe and I want to reformat it in order it remove the instances of whether a missing value or a zero occurs before the first non-zero value appears across a row. However I do not want to delete any rows or columns and I do not want to remove any 0s or missing values which appear after the non-zeroes.
Below is the dataframe I am working with:
> data =[['Adam',2.55,4.53,3.45,2.12,3.14],['Bill',np.NaN,2.14,3.65,4.12],['Chris',np.NaN,0,2.82,0,6.04],['David',np.NaN,0,7.42,3.52]]
> df = pd.DataFrame(data, columns = ['Name', 'A','B','C','D','E'])
Moreover, here is the expected outcome:
> data1 =[['Adam',2.55,4.53,3.45,2.12,3.14],['Bill',2.14,3.65,4.12],['Chris',2.82,0,6.04],['David',7.42,3.52]]
> df1 = pd.DataFrame(data1, columns = ['Name', 'A','B','C','D','E'])
Upvotes: 1
Views: 392
Reputation: 75080
This is not a trivial problem. Here is the solution:
m=df.set_index('Name')
m=m[m.isin(m.mask(m.le(0)).bfill(axis=1).iloc[:,0]).cumsum(axis=1).astype(bool)]
print(m)
A B C D E
Name
Adam 2.55 4.53 3.45 2.12 3.14
Bill NaN 2.14 3.65 4.12 NaN
Chris NaN NaN 2.82 0.00 6.04
David NaN NaN 7.42 3.52 NaN
Then using justify
:
pd.DataFrame(justify(m.values,np.nan),columns=m.columns,index=m.index).reset_index()
Name A B C D E
0 Adam 2.55 4.53 3.45 2.12 3.14
1 Bill 2.14 3.65 4.12 NaN NaN
2 Chris 2.82 0.00 6.04 NaN NaN
3 David 7.42 3.52 NaN NaN NaN
Explanation:
Step1: Set the Name
column as index so we can deal with numeric values only.
Step2: m.mask(m.le(0)).bfill(axis=1).iloc[:,0]
gives the first value which is greater than 0.
Step3: Then using isin()
to return True
wherever the value appears in each row.
Step4: cumsum(axis=1).astype(bool)
makes all the remaining elements as True so we can filter only those values, other values becomes NaN
.
Then use justify function from the linked post.
Upvotes: 1