megadarkfriend
megadarkfriend

Reputation: 373

How do you delete a non-numeric column from an input dataset?

For example, if I want to consider a flower species, number of petals, germination time and user ID, the user ID is going to have a hyphen in there. So in my data analysis, I don't want to use it. I'm aware that I can hard code it in, but I want to make it so when I input any dataset, it will automatically remove columns with non-numeric inputs.

Edit: Unclear question. I'm reading in data from a csv file using pandas.

Example:

        Species    NPetals    GermTime    UserID
    1    R. G        5          4           65-78

    2    R. F        5          3           65-81

I want to remove the UserID and Species columns from the dataset.

Upvotes: 1

Views: 2320

Answers (1)

EdChum
EdChum

Reputation: 394031

From the docs you can just select the numeric data by filtering using select_dtypes:

In [5]:
df = pd.DataFrame({'a': np.random.randn(6).astype('f4'),'b': [True, False] * 3,'c': [1.0, 2.0] * 3})
df

Out[5]:
          a      b  c
0  0.338710   True  1
1  1.530095  False  2
2 -0.048261   True  1
3 -0.505742  False  2
4  0.729667   True  1
5 -0.634482  False  2

In [15]:    
df.select_dtypes(include=[np.number])

Out[15]:
          a  c
0  0.338710  1
1  1.530095  2
2 -0.048261  1
3 -0.505742  2
4  0.729667  1
5 -0.634482  2

You can pass any valid np dtype hierarchy

Upvotes: 3

Related Questions