Matt W.
Matt W.

Reputation: 3722

keep nan in groubpy, and transform mean

I'm trying to group a dataframe and keep the values that have nan in them to be in the dataframe. For that reason I'm using fillna("").

Then I take the numeric column l, and I want to take the mean of the group. I'm running into an error Could not convert "" to numeric. It's doing that because my l column had some nan values in it.

Ideally after my groupby, I would like to convert those now "" values back to nan so I can complete the .mean() calc.

the work around that I've done is to turn only the columns I'm grouping by to "" first, doing the .mean(), then turning the "" values back to nan after its done. Is there a better way to do this?

import random
random.seed(100)

df = pd.DataFrame({
'a':[random.randint(0, 1) for x in range(1000)],
'b':[random.randint(0, 1) for x in range(1000)],
'c':[random.choice([0, 1, np.nan]) for x in range(1000)],
'd':[random.randint(0, 1) for x in range(1000)],
'e':[random.choice([0, 1, np.nan]) for x in range(1000)],
'f':[random.choice(['id', 'id2']) for x in range(1000)],
'g':[random.choice(['apple', 'orange', 'pear']) for x in range(1000)],
'h':[random.randint(0, 1) for x in range(1000)],
'i':[random.randint(0, 1) for x in range(1000)],
'j':[random.randint(0, 1) for x in range(1000)],
'k':[random.randint(0, 1) for x in range(1000)],
'l':[random.choice([1, 2, 34, 6, 36, 990, np.nan, np.nan, 6, 5, 3, 2, 1, 1, 1]) for x in range(1000)]
})


df['mean_l'] = df.fillna("").groupby(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k'])['l'].transform(lambda x: x.mean())

Upvotes: 0

Views: 105

Answers (1)

Anton vBR
Anton vBR

Reputation: 18914

We need to avoid to fill-na values in the l column in this case as it is your target column.

One way would be to reassign the column l with assign.

df['mean_l'] = (df.fillna('')
                  .assign(l=df.l)
                  .groupby(list('abcdefghijk'))['l']
                  .transform('mean'))

Upvotes: 1

Related Questions