user2489252
user2489252

Reputation:

Pandas selecting columns - best habit and performance

There are many different ways to select a column in a pandas.DataFrame (same for rows). I am wondering if it makes any difference and if there are any performance and style recommendations.

E.g., if I have a DataFrame as follows:

import pandas as pd
import numpy as np

df = pd.DataFrame(data=np.random.random((10,4)), columns=['a','b','c','d'])
df.head()

enter image description here

There are many different ways to select e.g., column d

Intuitively, I would prefer 2), maybe because I am used to the [row_indexer,column_indexer] style from numpy

Upvotes: 3

Views: 967

Answers (1)

suzanshakya
suzanshakya

Reputation: 3650

I would use ipython's magic function %timeit to find out the best performant method. The results are:

%timeit df['d']
100000 loops, best of 3: 5.35 µs per loop

%timeit df.loc[:,'d']
10000 loops, best of 3: 44.3 µs per loop

%timeit df.loc[:]['d']
100000 loops, best of 3: 12.4 µs per loop

%timeit df.ix[:]['d']
100000 loops, best of 3: 10.4 µs per loop

%timeit df.ix[:,'d']
10000 loops, best of 3: 53 µs per loop

It turns out that the 1st method is considerably faster than others.

Upvotes: 4

Related Questions