Reputation: 3722
I'm trying to group a dataframe
and keep the values that have nan
in them to be in the dataframe
. For that reason I'm using fillna("")
.
Then I take the numeric column l
, and I want to take the mean of the group. I'm running into an error Could not convert "" to numeric
. It's doing that because my l
column had some nan
values in it.
Ideally after my groupby, I would like to convert those now "" values back to nan
so I can complete the .mean()
calc.
the work around that I've done is to turn only the columns I'm grouping by to "" first, doing the .mean()
, then turning the "" values back to nan
after its done. Is there a better way to do this?
import random
random.seed(100)
df = pd.DataFrame({
'a':[random.randint(0, 1) for x in range(1000)],
'b':[random.randint(0, 1) for x in range(1000)],
'c':[random.choice([0, 1, np.nan]) for x in range(1000)],
'd':[random.randint(0, 1) for x in range(1000)],
'e':[random.choice([0, 1, np.nan]) for x in range(1000)],
'f':[random.choice(['id', 'id2']) for x in range(1000)],
'g':[random.choice(['apple', 'orange', 'pear']) for x in range(1000)],
'h':[random.randint(0, 1) for x in range(1000)],
'i':[random.randint(0, 1) for x in range(1000)],
'j':[random.randint(0, 1) for x in range(1000)],
'k':[random.randint(0, 1) for x in range(1000)],
'l':[random.choice([1, 2, 34, 6, 36, 990, np.nan, np.nan, 6, 5, 3, 2, 1, 1, 1]) for x in range(1000)]
})
df['mean_l'] = df.fillna("").groupby(['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j', 'k'])['l'].transform(lambda x: x.mean())
Upvotes: 0
Views: 105
Reputation: 18914
We need to avoid to fill-na values in the l
column in this case as it is your target column.
One way would be to reassign the column l
with assign.
df['mean_l'] = (df.fillna('')
.assign(l=df.l)
.groupby(list('abcdefghijk'))['l']
.transform('mean'))
Upvotes: 1