Combining multiple data types in pandas DataFrame

Question

I want to combine columns in a dataframe depending on whether the data is numeric or not, for example:

import pandas as pd
import numpy as np

x = {'a':[1,2], 'b':['foo','bar'],'c':[np.pi,np.e]}
y = pd.DataFrame.from_dict(x)
y.apply(lambda x: x.sum() if x.dtype in (np.int64,np.float64) else x.min())

This gives the desired output, but it seems like there should be a nicer way to write the last line--is there a simple way to just check if the number is a numpy scalar type instead of checking if the dtype is in a specified list of numpy dtypes?

Andy Hayden · Accepted Answer

Rather than do a apply here, I would probably check each column for whether it's numeric with a simple list comprehension and separate these paths and then concat them back. This will be more efficient for larger frames.

In [11]: numeric = np.array([dtype in [np.int64, np.float64] for dtype in y.dtypes])

In [12]: numeric
Out[12]: array([True, False, True])

There may be an is_numeric_dtype function but I'm not sure where it is..

In [13]: y.iloc[:, numeric].sum()
Out[13]: 
a    3.000000
c    5.859874
dtype: float64

In [14]: y.iloc[:, ~numeric].min()
Out[14]: 
b    bar
dtype: object

Now you can concat these and potentially reindex:

In [15]: pd.concat([y.iloc[:, numeric].sum(), y.iloc[:, ~numeric].min()]).reindex(y.columns)
Out[15]: 
a           3
b         bar
c    5.859874
dtype: object

Combining multiple data types in pandas DataFrame

Answers (2)

Related Questions