Ailurophile
Ailurophile

Reputation: 3005

Is there a python function for finding the numeric and categorical columns?

What is an efficient way of splitting/returning the categorical columns and numeric columns from the pandas data frame in python?

So far I'm using the below function for finding the categorical columns and numeric columns.

def returnCatNumList(df):
    
    object_cols = list(df.select_dtypes(exclude=['int', 'float', 'int64', 'float64', 
                                                 'int32', 'float32', 'int16', 'float16']).columns)
    numeric_cols = list(df.select_dtypes(include=['int', 'float', 'int64', 'float64', 
                                                  'int32', 'float32', 'int16', 'float16']).columns)

    return object_cols, numeric_cols

I'm looking for an efficient and better approach to do this. Any suggestions or references would be highly appreciated.

Upvotes: 2

Views: 284

Answers (3)

Ailurophile
Ailurophile

Reputation: 3005

We can also use the pandas types API which allows us to interact and manipulate the types of data

def returnCatNumList(df):
    object_cols = []
    numeric_cols  = []

    for label, content in df.items():
        if pd.api.types.is_string_dtype(content):
            numeric_cols.append(label)
        else:
            object_cols.append(label)
    return object_cols, numeric_cols

Example:

iris = sns.load_dataset('iris')

object_cols, numeric_cols = returnCatNumList(iris)

print(object_cols)
print(numeric_cols)

output:

>>> ['sepal_length', 'sepal_width', 'petal_length', 'petal_width']

>>> ['species']

Upvotes: 0

Multivac
Multivac

Reputation: 815

You can do this by simply using object dtype

def returnCatNumList(df):
    
    object_cols = df.select_dtypes(include="object").columns.tolist()
    numeric_cols = df.select_dtypes(exclude="object").columns.tolist()

    return object_cols, numeric_cols

Upvotes: 1

jezrael
jezrael

Reputation: 863166

You can simplify your answer by np.number instead list of numeric dtypes:

def returnCatNumList(df):
    
    object_cols = list(df.select_dtypes(exclude=np.number).columns)
    numeric_cols = list(df.select_dtypes(include=np.number).columns)

    return object_cols, numeric_cols

Another idea is for numeric_cols use Index.difference:

def returnCatNumList(df):
    
    object_cols = list(df.select_dtypes(exclude=np.number).columns)
    numeric_cols = list(df.columns.difference(object_cols, sort=False))

    return object_cols, numeric_cols

Upvotes: 2

Related Questions