Reputation: 3987
I have a DataFrame
like this:
>>> df = pd.DataFrame({'a': list('ABCD'), 'b': ['E',np.nan,np.nan,'F']})
a b
0 A E
1 B NaN
2 C NaN
3 D F
I am trying to fill NaN
with values of the previous column in the next row and dropping this second row. In other words, I want to combine the two rows with NaNs to form a single row without NaNs like this:
a b
0 A E
1 B C
2 D F
I have tried various flavors of df.fillna(method="<bfill/ffill>")
but this didn't give me the expected output.
I haven't found any other question about this problem, Here's one. And actually that DataFrame
is made from list of DataFrame
by doing .concat()
, you may notice that from indexes also. I am telling this because it may be easy to do in single row rather then in multiple rows.
I have found some suggestions to use shift
, combine_first
but non of them worked for me. You may try these too.
I also have found this too. It is a whole article about filling nan
values but I haven't found problem/answer like mine.
Upvotes: 2
Views: 347
Reputation: 4186
You can do this in two steps with a placeholder column. First you fill all the nans in column b
with the a
values from the next row. Then you apply the filtering. In this example I use ffill
with a limit of 1 to filter all nan values after the first, there's probably a better method.
import pandas as pd
import numpy as np
df=pd.DataFrame({"a":[1,2,3,3,4],"b":[1,2,np.nan,np.nan,4]})
# Fill all nans:
df['new_b'] = df['b'].fillna(df['a'].shift(-1))
df = df[df['b'].ffill(limit=1).notna()].copy() # .copy() because loc makes a view
df = df.drop('b', axis=1).rename(columns={'new_b': 'b'})
print(df)
# output:
# a b
# 0 1 1
# 1 2 2
# 2 3 2
# 4 4 4
Upvotes: 1
Reputation: 261685
OK misunderstood what you wanted to do the first time. The dummy example was a bit ambiguous.
Here is another:
>>> df = pd.DataFrame({'a': list('ABCD'), 'b': ['E',np.nan,np.nan,'F']})
a b
0 A E
1 B NaN
2 C NaN
3 D F
To my knowledge, this operation does not exist with pandas, so we will use numpy to do the work.
First transform the dataframe to numpy array and flatten
it to be one-dimensional. Then drop NaNs using pandas.isna
that is working on a larger range types than numpy.isnan
, and then reshape
the array to its original shape before transforming back to dataframe:
array = df.to_numpy().flatten()
pd.DataFrame(array[~pd.isna(array)].reshape(-1,df.shape[1]), columns=df.columns)
output:
a b
0 A E
1 B C
2 D F
It is also working for more complex examples, as long as the NaN pattern is conserved among columns with NaNs:
In:
a b c d
0 A H A2 H2
1 B NaN B2 NaN
2 C NaN C2 NaN
3 D I D2 I2
4 E NaN E2 NaN
5 F NaN F2 NaN
6 G J G2 J2
Out:
a b c d
0 A H A2 H2
1 B B2 C C2
2 D I D2 I2
3 E E2 F F2
4 G J G2 J2
In:
a b c
0 A F H
1 B NaN NaN
2 C NaN NaN
3 D NaN NaN
4 E G I
Out:
a b c
0 A F H
1 B C D
2 E G I
In case NaNs columns do not have the same pattern such as:
a b c d
0 A H A2 NaN
1 B NaN B2 NaN
2 C NaN C2 H2
3 D I D2 I2
4 E NaN E2 NaN
5 F NaN F2 NaN
6 G J G2 J2
You can apply the operation per group of two columns:
def elementwise_shift(df):
array = df.to_numpy().flatten()
return pd.DataFrame(array[~pd.isna(array)].reshape(-1,df.shape[1]), columns=df.columns)
(df.groupby(np.repeat(np.arange(df.shape[1]/2), 2), axis=1)
.apply(elementwise_shift)
)
output:
a b c d
0 A H A2 B2
1 B C C2 H2
2 D I D2 I2
3 E F E2 F2
4 G J G2 J2
Upvotes: 2