Reputation: 9636
I'm trying to find all names of columns that only have numeric
data in them. For this I'm making use of is.numeric
and applying it over my data like this:
> sapply(ds[vars], is.numeric)
MinTemp MaxTemp Rainfall Evaporation Sunshine WindGustDir WindGustSpeed WindDir9am WindDir3pm WindSpeed9am
TRUE TRUE TRUE TRUE TRUE FALSE TRUE FALSE FALSE TRUE
WindSpeed3pm Humidity9am Humidity3pm Pressure9am Pressure3pm Cloud9am Cloud3pm Temp9am Temp3pm RainToday
TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE
RainTomorrow
FALSE
The above makes sense according to my data. Column WindGustDir
and WindDir9am
for example have values like NW
so thats why they are FALSE
.
When I apply this on my data to get names of all columns that are numeric, I DON'T expect to see columns that are non-numeric - for example WindGustDir
and WindDir9am
. However, I'm seeing it WindDir9am
and not WindGustDir
. Question I don't understand why is this the case. How can I fix it so I only get numeric columns?
> numerics <- names(ds)[which(sapply(ds[vars], is.numeric))]
> numerics
[1] "Date" "Location" "MinTemp" "MaxTemp" "Rainfall" "Sunshine" "WindDir9am" "WindDir3pm" "WindSpeed9am"
[10] "WindSpeed3pm" "Humidity9am" "Humidity3pm" "Pressure9am" "Pressure3pm" "Cloud9am" "Cloud3pm"
Here is the link to the data I'm using: http://rattle.togaware.com/weather.csv
Edit
> vars
[1] "MinTemp" "MaxTemp" "Rainfall" "Evaporation" "Sunshine"
[6] "WindGustDir" "WindGustSpeed" "WindDir9am" "WindDir3pm" "WindSpeed9am"
[11] "WindSpeed3pm" "Humidity9am" "Humidity3pm" "Pressure9am" "Pressure3pm"
[16] "Cloud9am" "Cloud3pm" "Temp9am" "Temp3pm" "RainToday"
[21] "RainTomorrow"
Upvotes: 0
Views: 4768
Reputation: 89057
When you do:
which(sapply(ds[vars], is.numeric))
you get the indices of the numeric columns of ds[vars]
(not ds
). So if you want to get back at names, it is important you apply it to names(ds[vars])
and not names(ds)
which has different columns.
names(ds[vars])[which(sapply(ds[vars], is.numeric))]
You can also just do:
vars[which(sapply(ds[vars], is.numeric))]
and even use logical indexing like Richard suggested:
vars[sapply(ds[vars], is.numeric)]
Last, I would consider whether var
is useful at all, see if doing the work directly on df
:
names(df)[sapply(ds, is.numeric)]
gets you what you want.
Upvotes: 3
Reputation: 900
which(sapply(ds[vars], is.numeric))
should provide a vector of indices indicating columns that contain numeric data. Assuming ds
is a data.frame or matrix object, you can then use this vector to subset your original data:
ids <- which(sapply(ds, is.numeric))
foo <- ds[, ids]
edit: On second thought, there's no need for which()
at all. Just subset on the result of your sapply()
:
names(ds[, sapply(ds, is.numeric)])
#[1] "MinTemp" "MaxTemp" "Rainfall" "Evaporation" "Sunshine"
#[6] "WindGustSpeed" "WindSpeed9am" "WindSpeed3pm" "Humidity9am" "Humidity3pm"
#[11] "Pressure9am" "Pressure3pm" "Cloud9am" "Cloud3pm" "Temp9am"
#[16] "Temp3pm" "RISK_MM"
Upvotes: 2