Reputation: 628
I am trying to segment Census data from fairly deaggregated data (e.g. age variables in 5-yr groups), & creating summary variables based on aggregation (e.g. all males 18+ per county). My solution is rowSums, e.g. county$MalesOver18 <- rowSums(county[,c(68:87)])
, where vars 68-87 sum to males 18+ -- works fine. However, with 500 variables it is not efficient to count out the order of my start/end columns.
But when I use my preferred solution, column names for rowSums (e.g. rowSums(county[,c(H76007:H76025)]
, where H vars = field names), I get one of 2 msg errors:
run w/ col names in quotes: Error in "H76007":"H76025" : NA/NaN argument
In addition: Warning messages:
1: In
[.data.frame(county, , c("H76007":"H76025")) :
NAs introduced by coercion
2: In
[.data.frame(county, , c("H76007":"H76025")) :
NAs introduced by coercion
run w/ col names not in quotes: Error in
[.data.frame(county, , c(H76007:H76025)) :
object 'H76007' not found
I have tried using the na.rm command & setting my variables as numeric -- although they are already integers -- and all to no result.
any guidance? thanks.
Upvotes: 1
Views: 3323
Reputation: 7130
:
cannot be used for character type. Try to first obtain the index:
rowSums(county[,(which(names(county)=='H76007'):which(names(county)=='H76025'))])
Upvotes: 2
Reputation: 69171
When indexing data.frames by the column names, you can't use the :
operator. When you do this with numeric values, it creates a sequence:
> 2:5
[1] 2 3 4 5
However, that doesn't work with character data which is what you were seeing:
> "foo":"bar"
Error in "foo":"bar" : NA/NaN argument
In addition: Warning messages:
...
So, what to do? I can think of two options:
grepl
and some regex magic to identify the column names that you want to return. Here's a trivial example with the mtcars
data:#
colsToOperateOn <- grepl("mpg|cyl", colnames(mtcars))
> head(mtcars[, colsToOperateOn], 2)
mpg cyl
Mazda RX4 21 6
Mazda RX4 Wag 21 6
You would need to write however complicated of a regex as necessary to get the columns you want.
which
to identify the index of the starting and ending columns you want, and then turn those into a sequence:#
start <- which(colnames(mtcars) == "mpg")
end <- which(colnames(mtcars) == "cyl")
> head(mtcars[, start:end], 2)
mpg cyl
Mazda RX4 21 6
Mazda RX4 Wag 21 6
This may be a poor example since mpg
and cyl
are right next to one another, but it should prove the point.
Upvotes: 3