Reputation: 151
I have a dataframe as follows:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 0 19780116 930 1 A0X 62 63 5 6.0 63.0 80 -26 0 33CATA 11 NaN
1 1 19780116 930 3 C21 57 58 105 107.0 61.0 75 18 1 33CATA 11 NaN
2 2 19780116 930 4 B01 -17 -16 21 12.0 28.0 1 -28 0 33CATA 11 NaN
3 3 19780117 750 3 C21 67 69 95 137.0 70.0 72 18 1 44CATA 11 NaN
4 4 19780117 750 6 A01 - 4 - 2.0 53.0 28 23 1 -27 0 44CATA
In the last row (and several others in the full dataframe) the negative signs are misplaced.
My question is if a negative sign is there at i-th
position then it should merge i+1 th
element and shift all other row elements to the left such that it the final output should be
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 0 19780116 930 1 A0X 62 63.0 5.0 6.0 63.0 80 -26 0 33CATA 11.0 NaN
1 1 19780116 930 3 C21 57 58.0 105.0 107.0 61.0 75 18 1 33CATA 11.0 NaN
2 2 19780116 930 4 B01 -17 -16.0 21.0 12.0 28.0 1 -28 0 33CATA 11.0 NaN
3 3 19780117 750 3 C21 67 69.0 95.0 137.0 70.0 72 18 1 44CATA 11.0 NaN
4 4 19780117 750 6 A01 -4 -2.0 53.0 28.0 23.0 1 -27 0 44CATA NaN NaN
Upvotes: 4
Views: 90
Reputation: 71707
>>> df
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 0 19780116 930 1 A0X 62 63 5 6.0 63.0 80 -26 0 33CATA 11 NaN
1 1 19780116 930 3 C21 57 58 105 107.0 61.0 75 18 1 33CATA 11 NaN
2 2 19780116 930 4 B01 -17 -16 21 12.0 28.0 1 -28 0 33CATA 11 NaN
3 3 19780117 750 3 C21 67 69 95 137.0 70.0 72 18 1 44CATA 11 NaN
4 4 19780117 750 6 A01 - 4 - 2.0 53.0 28 23 1 -27 0 44CATA
c = df.copy()
m1 = c.eq('-')
m2 = m1.shift(axis=1, fill_value=False)
c.update(c.astype(str)[m2].radd('-').astype(float))
c = c.mask(m1)
i = np.argsort(c.isna(), axis=1)
c[:] = c.values[np.arange(len(c))[:, None], i]
Create a copy of the dataframe named as c
and compare with the character -
to create a boolean mask m1
>>> m1 = c.eq('-')
>>> m1
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 False False False False False False False False False False False False False False False False
1 False False False False False False False False False False False False False False False False
2 False False False False False False False False False False False False False False False False
3 False False False False False False False False False False False False False False False False
4 False False False False False True False True False False False False False False False False
Shift the boolean mask m1
along axis=1
to create a boolean mask m2
>>> m2 = m1.shift(axis=1, fill_value=False)
>>> m2
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 False False False False False False False False False False False False False False False False
1 False False False False False False False False False False False False False False False False
2 False False False False False False False False False False False False False False False False
3 False False False False False False False False False False False False False False False False
4 False False False False False False True False True False False False False False False False
Update the values in the dataframe c
by adding the character -
to values indexed by boolean mask m2
>>> c.update(c.astype(str)[m2].radd('-').astype(float))
>>> c
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 0 19780116 930 1 A0X 62 63.0 5 6.0 63.0 80 -26 0 33CATA 11 NaN
1 1 19780116 930 3 C21 57 58.0 105 107.0 61.0 75 18 1 33CATA 11 NaN
2 2 19780116 930 4 B01 -17 -16.0 21 12.0 28.0 1 -28 0 33CATA 11 NaN
3 3 19780117 750 3 C21 67 69.0 95 137.0 70.0 72 18 1 44CATA 11 NaN
4 4 19780117 750 6 A01 - -4.0 - -2.0 53.0 28 23 1 -27 0 44CATA
Mask the values in dataframe c
corresponding to mask m1
>>> c = c.mask(m1)
>>> c
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 0 19780116 930 1 A0X 62 63.0 5 6.0 63.0 80 -26 0 33CATA 11 NaN
1 1 19780116 930 3 C21 57 58.0 105 107.0 61.0 75 18 1 33CATA 11 NaN
2 2 19780116 930 4 B01 -17 -16.0 21 12.0 28.0 1 -28 0 33CATA 11 NaN
3 3 19780117 750 3 C21 67 69.0 95 137.0 70.0 72 18 1 44CATA 11 NaN
4 4 19780117 750 6 A01 NaN -4.0 NaN -2.0 53.0 28 23 1 -27 0 44CATA
Create a boolean mask with isna
and apply argsort
along axis=1
to get the indices that would sort the dataframe c
while also maintaining the relative order
>>> i = np.argsort(c.isna(), axis=1)
>>> i
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
3 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
4 0 1 2 3 4 6 8 9 10 11 12 13 14 15 5 7
Sort
the values in the dataframe c
along axis=1
using the above indices obtained using argsort
and assign the sorted values back to the dataframe c
>>> c[:] = c.values[np.arange(len(c))[:, None], i]
>>> c
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0 0 19780116 930 1 A0X 62 63.0 5 6.0 63.0 80 -26 0 33CATA 11.0 NaN
1 1 19780116 930 3 C21 57 58.0 105 107.0 61.0 75 18 1 33CATA 11.0 NaN
2 2 19780116 930 4 B01 -17 -16.0 21 12.0 28.0 1 -28 0 33CATA 11.0 NaN
3 3 19780117 750 3 C21 67 69.0 95 137.0 70.0 72 18 1 44CATA 11.0 NaN
4 4 19780117 750 6 A01 -4.0 -2.0 53.0 28.0 23.0 1 -27 0 44CATA NaN NaN
Upvotes: 4