keep nan in groubpy, and transform mean

Question

I'm trying to group a dataframe and keep the values that have nan in them to be in the dataframe. For that reason I'm using fillna("").

Then I take the numeric column l, and I want to take the mean of the group. I'm running into an error Could not convert "" to numeric. It's doing that because my l column had some nan values in it.

Ideally after my groupby, I would like to convert those now "" values back to nan so I can complete the .mean() calc.

the work around that I've done is to turn only the columns I'm grouping by to "" first, doing the .mean(), then turning the "" values back to nan after its done. Is there a better way to do this?

import random
random.seed(100)

df = pd.DataFrame({
'a':[random.randint(0, 1) for x in range(1000)],
'b':[random.randint(0, 1) for x in range(1000)],
'c':[random.choice([0, 1, np.nan]) for x in range(1000)],
'd':[random.randint(0, 1) for x in range(1000)],
'e':[random.choice([0, 1, np.nan]) for x in range(1000)],
'f':[random.choice(['id', 'id2']) for x in range(1000)],
'g':[random.choice(['apple', 'orange', 'pear']) for x in range(1000)],
'h':[random.randint(0, 1) for x in range(1000)],
'i':[random.randint(0, 1) for x in range(1000)],
'j':[random.randint(0, 1) for x in range(1000)],
'k':[random.randint(0, 1) for x in range(1000)],
'l':[random.choice([1, 2, 34, 6, 36, 990, np.nan, np.nan, 6, 5, 3, 2, 1, 1, 1]) for x in range(1000)]
})


df['mean_l'] = df.fillna("").groupby(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k'])['l'].transform(lambda x: x.mean())

Anton vBR · Accepted Answer

We need to avoid to fill-na values in the l column in this case as it is your target column.

One way would be to reassign the column l with assign.

df['mean_l'] = (df.fillna('')
                  .assign(l=df.l)
                  .groupby(list('abcdefghijk'))['l']
                  .transform('mean'))

keep nan in groubpy, and transform mean

Answers (1)

Related Questions