Phil_in_Tx
Phil_in_Tx

Reputation: 257

Shift row left by leading NaN's without removing all NaN's

Shift row left by leading NaN's without removing all NaN's How can I remove leading NaN's in pandas when reading in a csv file?

Example code:

df = pd.DataFrame({
'c1': [    20,      30,  np.nan,  np.nan,  np.nan,      17,  np.nan],
'c2': [np.nan,      74,      65,  np.nan,  np.nan,      74,      82],
'c3': [   250,     290,     340,     325,     345,     315,     248],
'c4': [   250,  np.nan,     340,     325,     345,     315,     248],
'c5': [np.nan,  np.nan,     340,  np.nan,     345,  np.nan,     248],
'c6': [np.nan,  np.nan,  np.nan,     325,     345,  np.nan,  np.nan]})

The code displays this

|  |   c1 |   c2 |  c3 |    c4 |    c5 |    c6 |
|:-|-----:|-----:|----:|------:|------:|------:|
|0 | 20.0 |  NaN | 250 | 250.0 |   NaN |   NaN |
|1 | 30.0 | 74.0 | 290 |   NaN |   NaN |   NaN |
|2 |  NaN | 65.0 | 340 | 340.0 | 340.0 |   NaN |
|3 |  NaN |  NaN | 325 | 325.0 |   NaN | 325.0 |
|4 |  NaN |  NaN | 345 | 345.0 | 345.0 | 345.0 |
|5 | 17.0 | 74.0 | 315 | 315.0 |   NaN |   NaN |
|6 |  NaN | 82.0 | 248 | 248.0 | 248.0 |   NaN |

I'd like to only reomve the leading NaN's so the result would look like this

|  |   c1 |   c2 |  c3 |    c4 |    c5 |    c6 |
|:-|-----:|-----:|----:|------:|------:|------:|
|0 |  20  |   NaN | 250.0 | 250.0 | NaN | NaN |
|1 |  30  |  74.0 | 290.0 |   NaN | NaN | NaN |
|2 |  65  | 340.0 | 340.0 | 340.0 | NaN | NaN |
|3 | 325  | 325.0 |   NaN | 325.0 | NaN | NaN |
|4 | 345  | 345.0 | 345.0 | 345.0 | NaN | NaN |
|5 |  17  |  74.0 | 315.0 | 315.0 | NaN | NaN |
|6 |  82  | 248.0 | 248.0 | 248.0 | NaN | NaN |

I have tried the following but that didn't work

response = pd.read_csv (r'MonthlyPermitReport.csv')
df = pd.DataFrame(response)
df.loc[df.first_valid_index():]

Help please.

Upvotes: 1

Views: 131

Answers (2)

Scott Boston
Scott Boston

Reputation: 153500

You can try this:

s = df.isna().cumprod(axis=1).sum(axis=1)
df.apply(lambda x: x.shift(-s[x.name]), axis=1)

Output:

      c1     c2     c3     c4  c5  c6
0   20.0    NaN  250.0  250.0 NaN NaN
1   30.0   74.0  290.0    NaN NaN NaN
2   65.0  340.0  340.0  340.0 NaN NaN
3  325.0  325.0    NaN  325.0 NaN NaN
4  345.0  345.0  345.0  345.0 NaN NaN
5   17.0   74.0  315.0  315.0 NaN NaN
6   82.0  248.0  248.0  248.0 NaN NaN

Details:

s, is a series that counts number of leading NaN in a row. isna finds all the NaN the dataframe, then using cumprod along the row axis we are eliminating NaN after a non-NaN value by multiplying by zero. Lastly, we use sum along row to calculate the number of place to shift each row.

Using dataframe apply with axis=1 (rowwise) the name of the pd.Series called in df.apply(axis=1) is the row index of the dataframe. Therefore we can fetch the number of periods to shift using, s defined above.

Upvotes: 3

BENY
BENY

Reputation: 323316

Let us try apply create the list then recreate the dataframe

out = pd.DataFrame(df.apply(lambda x : [x[x.notna().cumsum()>0].tolist()],1).str[0].tolist(),
                   index=df.index,
                   columns=df.columns)
Out[102]: 
      c1     c2     c3     c4  c5  c6
0   20.0    NaN  250.0  250.0 NaN NaN
1   30.0   74.0  290.0    NaN NaN NaN
2   65.0  340.0  340.0  340.0 NaN NaN
3  325.0  325.0    NaN  325.0 NaN NaN
4  345.0  345.0  345.0  345.0 NaN NaN
5   17.0   74.0  315.0  315.0 NaN NaN
6   82.0  248.0  248.0  248.0 NaN NaN

Upvotes: 1

Related Questions