Reputation: 2288
I want to combine columns in a dataframe depending on whether the data is numeric or not, for example:
import pandas as pd
import numpy as np
x = {'a':[1,2], 'b':['foo','bar'],'c':[np.pi,np.e]}
y = pd.DataFrame.from_dict(x)
y.apply(lambda x: x.sum() if x.dtype in (np.int64,np.float64) else x.min())
This gives the desired output, but it seems like there should be a nicer way to write the last line--is there a simple way to just check if the number is a numpy scalar type instead of checking if the dtype is in a specified list of numpy dtypes?
Upvotes: 2
Views: 3765
Reputation: 375415
Rather than do a apply here, I would probably check each column for whether it's numeric with a simple list comprehension and separate these paths and then concat them back. This will be more efficient for larger frames.
In [11]: numeric = np.array([dtype in [np.int64, np.float64] for dtype in y.dtypes])
In [12]: numeric
Out[12]: array([True, False, True])
There may be an is_numeric_dtype
function but I'm not sure where it is..
In [13]: y.iloc[:, numeric].sum()
Out[13]:
a 3.000000
c 5.859874
dtype: float64
In [14]: y.iloc[:, ~numeric].min()
Out[14]:
b bar
dtype: object
Now you can concat these and potentially reindex:
In [15]: pd.concat([y.iloc[:, numeric].sum(), y.iloc[:, ~numeric].min()]).reindex(y.columns)
Out[15]:
a 3
b bar
c 5.859874
dtype: object
Upvotes: 2