Reputation: 11633
I figured out these two methods. Is there a better one?
>>> import pandas as pd
>>> df = pd.DataFrame({'A': [5, 6, 7], 'B': [7, 8, 9]})
>>> print df.sum().sum()
42
>>> print df.values.sum()
42
Just want to make sure I'm not missing something more obvious.
Upvotes: 63
Views: 78253
Reputation: 741
Adding some numbers to support this:
import numpy as np, pandas as pd
import timeit
df = pd.DataFrame(np.arange(int(1e6)).reshape(500000, 2), columns=list("ab"))
def pandas_test():
return df['a'].sum()
def numpy_test():
return df['a'].to_numpy().sum()
timeit.timeit(numpy_test, number=1000) # 0.5032469799989485
timeit.timeit(pandas_test, number=1000) # 0.6035906639990571
So we get a 20% performance on my machine just for Series summations!
Upvotes: 5
Reputation: 294508
df.to_numpy().sum()
df.values
Is the underlying numpy array
df.values.sum()
Is the numpy sum method and is faster
Upvotes: 77