Reputation: 3458
I have a very sparse dataframe df
such as this:
Apples Bananas Pineapple Mango
Mary Apples NaN NaN NaN
Jane NaN Bananas NaN NaN
Diego NaN NaN NaN Mango
Guido NaN NaN Pineapple NaN
and I would like to build a dictionary d
such as
d = {'Apples':3, 'Bananas':1, 'Pineapple':2, 'Mango': 15}
to obtain
Apples Bananas Pineapple Mango
Mary 3 NaN NaN NaN
Jane NaN 1 NaN NaN
Diego NaN NaN NaN 15
Guido NaN NaN 2 NaN
I can do
df.to_sparse().replace(d)
but it's been over 30' and no output yet. My dataframe has 10000 rows times 1500 columns, the dataframe is originally 135MB, that becomes 850kB after to_sparse(). Is there any faster way?
Upvotes: 1
Views: 159
Reputation: 863166
EDIT by changed question - you can use stack
for Series
with MultiIndex
- (NaN
s values are removed) - with map
and then for reshape back unstack
:
np.random.seed(1235)
N = 1000
d = {'Apples':3, 'Bananas':1, 'Pineapple':2, 'Mango': 15}
df = pd.DataFrame(np.random.choice(list(d.keys()) + [np.nan],
size=(N, N),
p=(0.01,0.02,0.03,0.02,0.92)))
#print (df)
In [227]: %timeit df.replace(d)
1 loop, best of 3: 661 ms per loop
In [228]: %timeit df.stack().map(d).unstack()
1 loop, best of 3: 381 ms per loop
Upvotes: 1