Reputation: 373
For example, if I want to consider a flower species, number of petals, germination time and user ID
, the user ID
is going to have a hyphen in there. So in my data analysis, I don't want to use it. I'm aware that I can hard code it in, but I want to make it so when I input any dataset, it will automatically remove columns with non-numeric
inputs.
Edit: Unclear question. I'm reading in data from a csv file using pandas.
Example:
Species NPetals GermTime UserID
1 R. G 5 4 65-78
2 R. F 5 3 65-81
I want to remove the UserID
and Species
columns from the dataset
.
Upvotes: 1
Views: 2320
Reputation: 394031
From the docs you can just select the numeric data by filtering using select_dtypes
:
In [5]:
df = pd.DataFrame({'a': np.random.randn(6).astype('f4'),'b': [True, False] * 3,'c': [1.0, 2.0] * 3})
df
Out[5]:
a b c
0 0.338710 True 1
1 1.530095 False 2
2 -0.048261 True 1
3 -0.505742 False 2
4 0.729667 True 1
5 -0.634482 False 2
In [15]:
df.select_dtypes(include=[np.number])
Out[15]:
a c
0 0.338710 1
1 1.530095 2
2 -0.048261 1
3 -0.505742 2
4 0.729667 1
5 -0.634482 2
You can pass any valid np dtype hierarchy
Upvotes: 3