NiuBiBang
NiuBiBang

Reputation: 628

error with rowSums usng column names

I am trying to segment Census data from fairly deaggregated data (e.g. age variables in 5-yr groups), & creating summary variables based on aggregation (e.g. all males 18+ per county). My solution is rowSums, e.g. county$MalesOver18 <- rowSums(county[,c(68:87)]), where vars 68-87 sum to males 18+ -- works fine. However, with 500 variables it is not efficient to count out the order of my start/end columns.

But when I use my preferred solution, column names for rowSums (e.g. rowSums(county[,c(H76007:H76025)], where H vars = field names), I get one of 2 msg errors:

run w/ col names in quotes: Error in "H76007":"H76025" : NA/NaN argument In addition: Warning messages: 1: In[.data.frame(county, , c("H76007":"H76025")) : NAs introduced by coercion 2: In[.data.frame(county, , c("H76007":"H76025")) : NAs introduced by coercion

run w/ col names not in quotes: Error in[.data.frame(county, , c(H76007:H76025)) : object 'H76007' not found

I have tried using the na.rm command & setting my variables as numeric -- although they are already integers -- and all to no result.

any guidance? thanks.

Upvotes: 1

Views: 3323

Answers (2)

Nishanth
Nishanth

Reputation: 7130

: cannot be used for character type. Try to first obtain the index:

rowSums(county[,(which(names(county)=='H76007'):which(names(county)=='H76025'))])

Upvotes: 2

Chase
Chase

Reputation: 69171

When indexing data.frames by the column names, you can't use the : operator. When you do this with numeric values, it creates a sequence:

> 2:5
[1] 2 3 4 5

However, that doesn't work with character data which is what you were seeing:

> "foo":"bar"
Error in "foo":"bar" : NA/NaN argument
In addition: Warning messages:
...

So, what to do? I can think of two options:

  1. Use grepl and some regex magic to identify the column names that you want to return. Here's a trivial example with the mtcars data:

#

colsToOperateOn <- grepl("mpg|cyl", colnames(mtcars))
> head(mtcars[, colsToOperateOn], 2)
              mpg cyl
  Mazda RX4      21   6
Mazda RX4 Wag  21   6

You would need to write however complicated of a regex as necessary to get the columns you want.

  1. Use which to identify the index of the starting and ending columns you want, and then turn those into a sequence:

#

start <- which(colnames(mtcars) == "mpg")
end <- which(colnames(mtcars) == "cyl")
> head(mtcars[, start:end], 2)
              mpg cyl
Mazda RX4      21   6
Mazda RX4 Wag  21   6

This may be a poor example since mpg and cyl are right next to one another, but it should prove the point.

Upvotes: 3

Related Questions