Reputation: 2161
Let's say in R I have a data frame (called df
) with a bunch of columns containing integer data named "Var1foo", "Var2foo", and so on.
Now suppose I want to create a new column called sum1
that adds up everything between "Var3foo" and "Var6foo". I might do:
df$sum1 <- rowSums(df[Var3foo:Var6foo])
Or, I might do something a bit more complicated and create a new column called foobar
with apply()
like so:
eenie = 3
meenie = 2
df$foobar <- apply(df, 1, function(x) if (sum(x[Var2foo:Var7foo]) == eenie & sum(x[1:Var3foo]) != meenie) 1 else 0)
The problem is I always have to explicitly write out the column names or index when referring to those columns. What if I want to refer to column "Varxfoo" where x <- 8
or "Varyfoo" where y <- 12
?
What I mean is, I wouldn't be able to do df$paste0("Var", x, "foo")
or sum(x[paste0("Var", x, "foo"):paste0("Var", y, "foo")])
.
I also considered using dplyr::mutate()
to create df$sum1
and df$foobar
but it seems to also need explicit column (variable) names.
What should I do? Thanks!!
Upvotes: 0
Views: 671
Reputation: 2922
Maybe you could refer the column with
df[paste0("Var", x, "foo")]
If you keep using such things a lot, you could use some function to reduce your work,
int2name <- function(x, prefix = "", suffix = ""){
paste0(prefix, x, suffix)
}
And then you can use:
df[int2name(2:4, prefix = "Var", suffix = "foo")]
Upvotes: 1
Reputation: 522
A simple solution would be directly referencing the columns, with
sum(df[,x:y])
Of course this only works if the columns are in order.
Upvotes: 1