birdy
birdy

Reputation: 9636

How to find all numeric columns in data

I'm trying to find all names of columns that only have numeric data in them. For this I'm making use of is.numeric and applying it over my data like this:

> sapply(ds[vars], is.numeric)
      MinTemp       MaxTemp      Rainfall   Evaporation      Sunshine   WindGustDir WindGustSpeed    WindDir9am    WindDir3pm  WindSpeed9am 
         TRUE          TRUE          TRUE          TRUE          TRUE         FALSE          TRUE         FALSE         FALSE          TRUE 
 WindSpeed3pm   Humidity9am   Humidity3pm   Pressure9am   Pressure3pm      Cloud9am      Cloud3pm       Temp9am       Temp3pm     RainToday 
         TRUE          TRUE          TRUE          TRUE          TRUE          TRUE          TRUE          TRUE          TRUE         FALSE 
 RainTomorrow 
        FALSE 

The above makes sense according to my data. Column WindGustDir and WindDir9am for example have values like NW so thats why they are FALSE.

When I apply this on my data to get names of all columns that are numeric, I DON'T expect to see columns that are non-numeric - for example WindGustDir and WindDir9am. However, I'm seeing it WindDir9am and not WindGustDir. Question I don't understand why is this the case. How can I fix it so I only get numeric columns?

> numerics <- names(ds)[which(sapply(ds[vars], is.numeric))]
> numerics
 [1] "Date"         "Location"     "MinTemp"      "MaxTemp"      "Rainfall"     "Sunshine"     "WindDir9am"   "WindDir3pm"   "WindSpeed9am"
[10] "WindSpeed3pm" "Humidity9am"  "Humidity3pm"  "Pressure9am"  "Pressure3pm"  "Cloud9am"     "Cloud3pm"  

Here is the link to the data I'm using: http://rattle.togaware.com/weather.csv

Edit

> vars
 [1] "MinTemp"       "MaxTemp"       "Rainfall"      "Evaporation"   "Sunshine"     
 [6] "WindGustDir"   "WindGustSpeed" "WindDir9am"    "WindDir3pm"    "WindSpeed9am" 
[11] "WindSpeed3pm"  "Humidity9am"   "Humidity3pm"   "Pressure9am"   "Pressure3pm"  
[16] "Cloud9am"      "Cloud3pm"      "Temp9am"       "Temp3pm"       "RainToday"    
[21] "RainTomorrow"

Upvotes: 0

Views: 4768

Answers (2)

flodel
flodel

Reputation: 89057

When you do:

which(sapply(ds[vars], is.numeric))

you get the indices of the numeric columns of ds[vars] (not ds). So if you want to get back at names, it is important you apply it to names(ds[vars]) and not names(ds) which has different columns.

names(ds[vars])[which(sapply(ds[vars], is.numeric))]

You can also just do:

vars[which(sapply(ds[vars], is.numeric))]

and even use logical indexing like Richard suggested:

vars[sapply(ds[vars], is.numeric)]

Last, I would consider whether var is useful at all, see if doing the work directly on df:

names(df)[sapply(ds, is.numeric)]

gets you what you want.

Upvotes: 3

x4nd3r
x4nd3r

Reputation: 900

which(sapply(ds[vars], is.numeric)) should provide a vector of indices indicating columns that contain numeric data. Assuming ds is a data.frame or matrix object, you can then use this vector to subset your original data:

ids <- which(sapply(ds, is.numeric))
foo <- ds[, ids]

edit: On second thought, there's no need for which() at all. Just subset on the result of your sapply():

names(ds[, sapply(ds, is.numeric)])
#[1] "MinTemp"       "MaxTemp"       "Rainfall"      "Evaporation"   "Sunshine"     
#[6] "WindGustSpeed" "WindSpeed9am"  "WindSpeed3pm"  "Humidity9am"   "Humidity3pm"  
#[11] "Pressure9am"   "Pressure3pm"   "Cloud9am"      "Cloud3pm"      "Temp9am"      
#[16] "Temp3pm"       "RISK_MM" 

Upvotes: 2

Related Questions