Ruth Aju
Ruth Aju

Reputation: 1

Data cleaning using pandas

A B
0 Abel 10
1 Ada 8
2 Beed 15
3 Calvin 13
4 NaN 6
5 NaN 17
6 NaN 35
7 NaN 12
8 NaN 20
9 NaN 19

I want to fill the cells with NaN values with the Non-NaN values circling from index 0 to index 3.

import pandas as pd
data = {'A': ['Abel', 'Ada', 'Beed', 'Calvin', np.nan, np.nan, np.nan, np.nan, np.nan, np.nan],

'B': [10, 8, 15, 13, 6, 17, 35, 12, 20, 19]}
df = pd.DataFrame(data)

fill_values = df['A'].dropna().tolist()

df['A'] = df['A'].fillna(pd.Series(fill_values * (len(df) // len(fill_values) + 1))[:len(df)])
print(df)

It filled the NaN cells but it didn't start circling from the first index. It filled it with the previous value.

Expected output

A B
0 Abel 10
1 Ada 8
2 Beed 15
3 Calvin 13
4 Abel 6
5 Ada 17
6 Beed 35
7 Calvin 12
8 Abel 20
9 Ada 19

Upvotes: 0

Views: 97

Answers (2)

PaulS
PaulS

Reputation: 25393

Another possible solution, whose steps are:

  • df['A'].dropna() removes any NaN entries from column A, and .to_list() converts the remaining values into a list.

  • len(df) calculates the total number of rows in df.

  • (vals * (df_len // len(vals) + 1)) creates a list by repeating the list vals enough times to cover the entire length of df.

vals, df_len = df['A'].dropna().to_list(), len(df)
df['A'] = (vals * (df_len // len(vals) + 1))[:df_len]

Output:

        A   B
0    Abel  10
1     Ada   8
2    Beed  15
3  Calvin  13
4    Abel   6
5     Ada  17
6    Beed  35
7  Calvin  12
8    Abel  20
9     Ada  19

Upvotes: 0

mozway
mozway

Reputation: 262204

For a pure pandas approach, you could identify the NaNs, and perform a map after enumerating the NaNs with cumsum+mod:

s = df['A'].dropna(ignore_index=True)
m = df['A'].isna()

df.loc[m, 'A'] = m[m].cumsum().sub(1).mod(len(s)).map(s)

Output:

        A   B
0    Abel  10
1     Ada   8
2    Beed  15
3  Calvin  13
4    Abel   6
5     Ada  17
6    Beed  35
7  Calvin  12
8    Abel  20
9     Ada  19

Intermediates:

# s = df['A'].dropna(ignore_index=True)
0      Abel
1       Ada
2      Beed
3    Calvin
Name: A, dtype: object

# dataframe
        A   B      m  cumsum-1  mod filled_A
0    Abel  10  False       NaN  NaN      NaN
1     Ada   8  False       NaN  NaN      NaN
2    Beed  15  False       NaN  NaN      NaN
3  Calvin  13  False       NaN  NaN      NaN
4     NaN   6   True       0.0  0.0     Abel
5     NaN  17   True       1.0  1.0      Ada
6     NaN  35   True       2.0  2.0     Beed
7     NaN  12   True       3.0  3.0   Calvin
8     NaN  20   True       4.0  0.0     Abel
9     NaN  19   True       5.0  1.0      Ada

Variants

As suggested in comments, it is also possible to use itertools:

from itertools import cycle, islice

g = cycle(df['A'].dropna().tolist())
m = df['A'].isna()

df.loc[m, 'A'] = list(islice(g, m.sum()))

#         A   B
# 0    Abel  10
# 1     Ada   8
# 2    Beed  15
# 3  Calvin  13
# 4    Abel   6
# 5     Ada  17
# 6    Beed  35
# 7  Calvin  12
# 8    Abel  20
# 9     Ada  19

Or numpy.resize:

m = df['A'].isna()

df.loc[m, 'A'] = np.resize(df.loc[~m, 'A'], m.sum())

#         A   B
# 0    Abel  10
# 1     Ada   8
# 2    Beed  15
# 3  Calvin  13
# 4    Abel   6
# 5     Ada  17
# 6    Beed  35
# 7  Calvin  12
# 8    Abel  20
# 9     Ada  19

Upvotes: 0

Related Questions