Reputation: 759
I have a Pandas dataframe in Python (3.6) with numeric and categorical attributes. I want to pull a list of numeric columns for use in other parts of my code. My question is what is the most efficient way of doing this?
This seems to be the standard answer:
num_cols = df.select_dtypes([np.number]).columns.tolist()
But I'm worried that select_dtypes()
can be slow and this seem to add a middle step that I'm hoping isn't necessary (subsetting the data before pulling back the column names of just the numeric attributes).
Any ideas on a more efficient way of doing this? (I know there is a private method _get_numeric_data()
that could also be used, but wasn't able to find out how that works and I don't love using a private method as a long-term solution).
Upvotes: 2
Views: 910
Reputation: 961
Two ways (without using df.select_dtypes which unnecessarily creates a temporary intermediate dataframe):
import numpy as np
[c for c in df.columns if np.issubdtype(df[c].dtype, np.number)]
from pandas.api.types import is_numeric_dtype
[c for c in df.columns if is_numeric_dtype(c)]
Or if you want the result to be a pd.Index rather than just a list of column name strings as above, here are three ways (first is from @juanpa.arrivillaga):
import numpy as np
df.columns[[np.issubdtype(dt, np.number) for dt in df.dtypes]]
from pandas.api.types import is_numeric_dtype
df.columns[[is_numeric_dtype(c) for c in df.columns]]
from pandas.api.types import is_numeric_dtype
df.columns[list(map(is_numeric_dtype, df.columns))]
Some other solutions consider a bool column to be numeric, but the solutions above do not (tested with numpy 1.22.3 / pandas 1.4.2).
Upvotes: 0
Reputation: 96246
df.select_dtypes
is for selecting data, it makes a copy of your data, which you essentially discard, by then only selecting the columns. This is an inefficent way. Just use something like:
df.columns[[np.issubdtype(dt, np.number) for dt in df.dtypes]]
Upvotes: 3