niedakh
niedakh

Reputation: 2999

Creating a zero-filled pandas data frame

What is the best way to create a zero-filled pandas data frame of a given size?

I have used:

zero_data = np.zeros(shape=(len(data),len(feature_list)))
d = pd.DataFrame(zero_data, columns=feature_list)

Is there a better way to do it?

Upvotes: 169

Views: 342295

Answers (6)

Shravan
Shravan

Reputation: 2723

Create and fill a pandas dataframe with zeros

feature_list = ["foo", "bar", 37]
df = pd.DataFrame(0, index=np.arange(7), columns=feature_list) 
print(df) 

which prints:

   foo  bar  37
0    0    0   0
1    0    0   0
2    0    0   0
3    0    0   0
4    0    0   0
5    0    0   0
6    0    0   0

Upvotes: 220

chakuRak
chakuRak

Reputation: 650

If you would like the new data frame to have the same index and columns as an existing data frame, you can just multiply the existing data frame by zero:

df_zeros = df * 0

If the existing data frame contains NaNs or non-numeric values you can instead apply a function to each cell that will just return 0:

df_zeros = df.applymap(lambda x: 0)

Upvotes: 21

Alex
Alex

Reputation: 12913

It's best to do this with numpy in my opinion

import numpy as np
import pandas as pd
d = pd.DataFrame(np.zeros((N_rows, N_cols)))

Upvotes: 50

WaveRider
WaveRider

Reputation: 495

Similar to @Shravan, but without the use of numpy:

  height = 10
  width = 20
  df_0 = pd.DataFrame(0, index=range(height), columns=range(width))

Then you can do whatever you want with it:

post_instantiation_fcn = lambda x: str(x)
df_ready_for_whatever = df_0.applymap(post_instantiation_fcn)

Upvotes: 19

Mark Horvath
Mark Horvath

Reputation: 1166

Assuming having a template DataFrame, which one would like to copy with zero values filled here...

If you have no NaNs in your data set, multiplying by zero can be significantly faster:

In [19]: columns = ["col{}".format(i) for i in xrange(3000)]                                                                                       

In [20]: indices = xrange(2000)

In [21]: orig_df = pd.DataFrame(42.0, index=indices, columns=columns)

In [22]: %timeit d = pd.DataFrame(np.zeros_like(orig_df), index=orig_df.index, columns=orig_df.columns)
100 loops, best of 3: 12.6 ms per loop

In [23]: %timeit d = orig_df * 0.0
100 loops, best of 3: 7.17 ms per loop

Improvement depends on DataFrame size, but never found it slower.

And just for the heck of it:

In [24]: %timeit d = orig_df * 0.0 + 1.0
100 loops, best of 3: 13.6 ms per loop

In [25]: %timeit d = pd.eval('orig_df * 0.0 + 1.0')
100 loops, best of 3: 8.36 ms per loop

But:

In [24]: %timeit d = orig_df.copy()
10 loops, best of 3: 24 ms per loop

EDIT!!!

Assuming you have a frame using float64, this will be the fastest by a huge margin! It is also able to generate any value by replacing 0.0 to the desired fill number.

In [23]: %timeit d = pd.eval('orig_df > 1.7976931348623157e+308 + 0.0')
100 loops, best of 3: 3.68 ms per loop

Depending on taste, one can externally define nan, and do a general solution, irrespective of the particular float type:

In [39]: nan = np.nan
In [40]: %timeit d = pd.eval('orig_df > nan + 0.0')
100 loops, best of 3: 4.39 ms per loop

Upvotes: 3

mtd
mtd

Reputation: 2364

If you already have a dataframe, this is the fastest way:

In [1]: columns = ["col{}".format(i) for i in range(10)]
In [2]: orig_df = pd.DataFrame(np.ones((10, 10)), columns=columns)
In [3]: %timeit d = pd.DataFrame(np.zeros_like(orig_df), index=orig_df.index, columns=orig_df.columns)
10000 loops, best of 3: 60.2 µs per loop

Compare to:

In [4]: %timeit d = pd.DataFrame(0, index = np.arange(10), columns=columns)
10000 loops, best of 3: 110 µs per loop

In [5]: temp = np.zeros((10, 10))
In [6]: %timeit d = pd.DataFrame(temp, columns=columns)
10000 loops, best of 3: 95.7 µs per loop

Upvotes: 2

Related Questions