Reputation: 257
Shift row left by leading NaN's without removing all NaN's How can I remove leading NaN's in pandas when reading in a csv file?
Example code:
df = pd.DataFrame({
'c1': [ 20, 30, np.nan, np.nan, np.nan, 17, np.nan],
'c2': [np.nan, 74, 65, np.nan, np.nan, 74, 82],
'c3': [ 250, 290, 340, 325, 345, 315, 248],
'c4': [ 250, np.nan, 340, 325, 345, 315, 248],
'c5': [np.nan, np.nan, 340, np.nan, 345, np.nan, 248],
'c6': [np.nan, np.nan, np.nan, 325, 345, np.nan, np.nan]})
The code displays this
| | c1 | c2 | c3 | c4 | c5 | c6 |
|:-|-----:|-----:|----:|------:|------:|------:|
|0 | 20.0 | NaN | 250 | 250.0 | NaN | NaN |
|1 | 30.0 | 74.0 | 290 | NaN | NaN | NaN |
|2 | NaN | 65.0 | 340 | 340.0 | 340.0 | NaN |
|3 | NaN | NaN | 325 | 325.0 | NaN | 325.0 |
|4 | NaN | NaN | 345 | 345.0 | 345.0 | 345.0 |
|5 | 17.0 | 74.0 | 315 | 315.0 | NaN | NaN |
|6 | NaN | 82.0 | 248 | 248.0 | 248.0 | NaN |
I'd like to only reomve the leading NaN's so the result would look like this
| | c1 | c2 | c3 | c4 | c5 | c6 |
|:-|-----:|-----:|----:|------:|------:|------:|
|0 | 20 | NaN | 250.0 | 250.0 | NaN | NaN |
|1 | 30 | 74.0 | 290.0 | NaN | NaN | NaN |
|2 | 65 | 340.0 | 340.0 | 340.0 | NaN | NaN |
|3 | 325 | 325.0 | NaN | 325.0 | NaN | NaN |
|4 | 345 | 345.0 | 345.0 | 345.0 | NaN | NaN |
|5 | 17 | 74.0 | 315.0 | 315.0 | NaN | NaN |
|6 | 82 | 248.0 | 248.0 | 248.0 | NaN | NaN |
I have tried the following but that didn't work
response = pd.read_csv (r'MonthlyPermitReport.csv')
df = pd.DataFrame(response)
df.loc[df.first_valid_index():]
Help please.
Upvotes: 1
Views: 131
Reputation: 153500
You can try this:
s = df.isna().cumprod(axis=1).sum(axis=1)
df.apply(lambda x: x.shift(-s[x.name]), axis=1)
Output:
c1 c2 c3 c4 c5 c6
0 20.0 NaN 250.0 250.0 NaN NaN
1 30.0 74.0 290.0 NaN NaN NaN
2 65.0 340.0 340.0 340.0 NaN NaN
3 325.0 325.0 NaN 325.0 NaN NaN
4 345.0 345.0 345.0 345.0 NaN NaN
5 17.0 74.0 315.0 315.0 NaN NaN
6 82.0 248.0 248.0 248.0 NaN NaN
Details:
s, is a series that counts number of leading NaN in a row. isna
finds all the NaN the dataframe, then using cumprod
along the row axis we are eliminating NaN after a non-NaN value by multiplying by zero. Lastly, we use sum
along row to calculate the number of place to shift each row.
Using dataframe apply
with axis=1
(rowwise) the name of the pd.Series called in df.apply(axis=1) is the row index of the dataframe. Therefore we can fetch the number of periods to shift
using, s defined above.
Upvotes: 3
Reputation: 323316
Let us try apply create the list then recreate the dataframe
out = pd.DataFrame(df.apply(lambda x : [x[x.notna().cumsum()>0].tolist()],1).str[0].tolist(),
index=df.index,
columns=df.columns)
Out[102]:
c1 c2 c3 c4 c5 c6
0 20.0 NaN 250.0 250.0 NaN NaN
1 30.0 74.0 290.0 NaN NaN NaN
2 65.0 340.0 340.0 340.0 NaN NaN
3 325.0 325.0 NaN 325.0 NaN NaN
4 345.0 345.0 345.0 345.0 NaN NaN
5 17.0 74.0 315.0 315.0 NaN NaN
6 82.0 248.0 248.0 248.0 NaN NaN
Upvotes: 1