Reputation:
There are many different ways to select a column in a pandas.DataFrame
(same for rows). I am wondering if it makes any difference and if there are any performance and style recommendations.
E.g., if I have a DataFrame as follows:
import pandas as pd
import numpy as np
df = pd.DataFrame(data=np.random.random((10,4)), columns=['a','b','c','d'])
df.head()
There are many different ways to select e.g., column d
df['d']
df.loc[:,'d']
(where df.loc[row_indexer,column_indexer]
)df.loc[:]['d']
df.ix[:]['d']
df.ix[:,'d']
Intuitively, I would prefer 2), maybe because I am used to the [row_indexer,column_indexer]
style from numpy
Upvotes: 3
Views: 967
Reputation: 3650
I would use ipython
's magic function %timeit
to find out the best performant method.
The results are:
%timeit df['d']
100000 loops, best of 3: 5.35 µs per loop
%timeit df.loc[:,'d']
10000 loops, best of 3: 44.3 µs per loop
%timeit df.loc[:]['d']
100000 loops, best of 3: 12.4 µs per loop
%timeit df.ix[:]['d']
100000 loops, best of 3: 10.4 µs per loop
%timeit df.ix[:,'d']
10000 loops, best of 3: 53 µs per loop
It turns out that the 1st method is considerably faster than others.
Upvotes: 4