famargar
famargar

Reputation: 3458

Fast way to replace value in sparse dataframe with value from dictionary

I have a very sparse dataframe df such as this:

       Apples Bananas Pineapple Mango
Mary   Apples     NaN       NaN   NaN
Jane      NaN Bananas       NaN   NaN
Diego     NaN     NaN       NaN Mango
Guido     NaN     NaN Pineapple   NaN

and I would like to build a dictionary d such as

d = {'Apples':3, 'Bananas':1, 'Pineapple':2, 'Mango': 15}

to obtain

       Apples Bananas Pineapple Mango
Mary        3     NaN       NaN   NaN
Jane      NaN       1       NaN   NaN
Diego     NaN     NaN       NaN    15
Guido     NaN     NaN         2   NaN

I can do

df.to_sparse().replace(d)

but it's been over 30' and no output yet. My dataframe has 10000 rows times 1500 columns, the dataframe is originally 135MB, that becomes 850kB after to_sparse(). Is there any faster way?

Upvotes: 1

Views: 159

Answers (1)

jezrael
jezrael

Reputation: 863166

EDIT by changed question - you can use stack for Series with MultiIndex - (NaNs values are removed) - with map and then for reshape back unstack:

np.random.seed(1235)
N = 1000
d = {'Apples':3, 'Bananas':1, 'Pineapple':2, 'Mango': 15}
df = pd.DataFrame(np.random.choice(list(d.keys()) + [np.nan], 
                  size=(N, N), 
                  p=(0.01,0.02,0.03,0.02,0.92)))
#print (df)

In [227]: %timeit df.replace(d)
1 loop, best of 3: 661 ms per loop

In [228]: %timeit df.stack().map(d).unstack()
1 loop, best of 3: 381 ms per loop

Upvotes: 1

Related Questions