Reputation: 1260
I have a data frame in which the names of the columns are something like a,b,v1,v2,v3...v100. I want to create a new column that applies a function to only the columns whose names include 'v'.
For example, given this data frame
df<-data.frame(a=rnorm(3),v1=rnorm(3),v2=rnorm(3),v3=rnorm(3))
I want to create a new column in which each element is the sum of the elements of v1, v2 and v3 that are in the same row.
Upvotes: 3
Views: 1045
Reputation: 5889
To combine both @James's and @Anatoliy's answers,
apply(df[grepl('^v', names(df))], 1, sum)
I went ahead and anchored the v in the regular expression to the beginning of the string. Other examples haven't done that but it appears that you want all columns that begin with v not the larger set that may have a v in their name. If I am wrong you could just do
apply(df[grepl('v', names(df))], 1, sum)
You should avoid using subset()
when programming, as stated in ?subset
This is a convenience function intended for use interactively. For programming it is better to use the standard subsetting functions like ‘[’, and in particular the non-standard evaluation of argument ‘subset’ can have unanticipated consequences.
Also, as I learned yesterday from Richie Cotton, when indexing it is better to use grepl
than grep
.
Upvotes: 3
Reputation: 66834
grep
on names
to get the column positions, then use rowSums
:
rowSums(df[,grep("v",names(df))])
Upvotes: 6
Reputation: 1380
That should do:
df$sums<- rowSums(subset(df, select=grepl("v", names(df))))
For a more general approach:
apply(subset(df, select=grepl("v", names(df))), 1, sum)
Upvotes: 2