Reputation: 1260

R, create a new column in a data frame that applies a function of all the columns with similar names

I have a data frame in which the names of the columns are something like a,b,v1,v2,v3...v100. I want to create a new column that applies a function to only the columns whose names include 'v'.

For example, given this data frame

df<-data.frame(a=rnorm(3),v1=rnorm(3),v2=rnorm(3),v3=rnorm(3))

I want to create a new column in which each element is the sum of the elements of v1, v2 and v3 that are in the same row.

Upvotes: 3

Answers (3)

adamleerich

Reputation: 5889

To combine both @James's and @Anatoliy's answers,

apply(df[grepl('^v', names(df))], 1, sum)

I went ahead and anchored the v in the regular expression to the beginning of the string. Other examples haven't done that but it appears that you want all columns that begin with v not the larger set that may have a v in their name. If I am wrong you could just do

apply(df[grepl('v', names(df))], 1, sum)

You should avoid using subset() when programming, as stated in ?subset

This is a convenience function intended for use interactively. For programming it is better to use the standard subsetting functions like ‘[’, and in particular the non-standard evaluation of argument ‘subset’ can have unanticipated consequences.

Also, as I learned yesterday from Richie Cotton, when indexing it is better to use grepl than grep.

Upvotes: 3

James

Reputation: 66834

grep on names to get the column positions, then use rowSums:

rowSums(df[,grep("v",names(df))])

Upvotes: 6

Anatoliy

Reputation: 1380

That should do:

df$sums<- rowSums(subset(df, select=grepl("v", names(df))))

For a more general approach:

apply(subset(df, select=grepl("v", names(df))), 1, sum)

Upvotes: 2

R, create a new column in a data frame that applies a function of all the columns with similar names

Answers (3)

Related Questions