semibah
semibah

Reputation: 3

R: Applying grep sequentially when pattern is a vector and bind results to a matrix

I have a named matrix with the following 3 part name structure (xxx-#h-#):

    xxx-0h-0 | xxx-0h-1 | xxx-0h-2 | xxx-1h-0 | ... | xxx-60h-2
v1
v2
v3
...
vn

I am attempting to find out which columns share a name searched up with a concatenate of the first two parts of the name where xxx is a fixed value and the variable "names" contains all the possible values for the middle position. The last position is variable.

names <- c("0h","1h","6h","16h","24h","42h","60h")
names <-paste("XXX",names,sep=" ")

I am using grep for the lookup:

grep(names[1],colnames(x))

Which correctly returns:

[1] 1 2 3

I then attempt to merge the resulting columns by a cbind to then obtain the mean of all observations that share a first and second column naming position and assign it to a new variable.

Where

xxx-1h <- rowMeans(cbind(x[,grep(names[1],colnames(x))]))

Would give me the corresponding mean calculated from columns 1,2,3 which were previously found by grep,

when i fail to specify a subset of the "names" vector, I receive the following error:

Warning message:
In grep(names, colnames(x)) :
  argument 'pattern' has length > 1 and only the first element will be used

How can i incorporate more than just the first element in a sequence?

Essentially, i'd like the following to happen:

xxx-0H <- rowMeans(cbind(x[,grep(names[1],colnames(x))]))
xxx-1H <- rowMeans(cbind(x[,grep(names[2],colnames(x))]))
xxx-6H <- rowMeans(cbind(x[,grep(names[3],colnames(x))]))
xxx-16H <- rowMeans(cbind(x[,grep(names[4],colnames(x))]))
xxx-24H <- rowMeans(cbind(x[,grep(names[5],colnames(x))]))
xxx-42H <- rowMeans(cbind(x[,grep(names[6],colnames(x))]))
xxx-60H <- rowMeans(cbind(x[,grep(names[7],colnames(x))]))

and concatenate each of the resulting integer vectors, into a matrix conserving the row naming scheme (which is shared among all columns), while omitting the last digit from the column names (xxx-0H | xxx-1H | xxx-2H). I would end up with a 7 column, n row matrix.

My last resort would be to use a for loop. Is there an elegant way to do this using apply or any of its variants?

Upvotes: 0

Views: 744

Answers (1)

effel
effel

Reputation: 1421

Edit: Right, I see what you're looking for now. Here's a full example, starting with two pairs of columns that share a middle name.

mid <- c("0h", "6h")
name <- paste(rep("XXX", 4), rep(mid, each = 2), 1:2, sep="-")
df = setNames(cbind(cars, cars), name)
df = df[1:4, ]
df
#   XXX-0h-1 XXX-0h-2 XXX-6h-1 XXX-6h-2
# 1        4        2        4        2
# 2        4       10        4       10
# 3        7        4        7        4
# 4        7       22        7       22

With the data set up, call rowMeans over the table as many times as there are middle names, each time subsetting the table to the columns whose names include a given middle name.

sapply(mid, function(x) rowMeans(df[grep(x, names(df))]))
#     0h   6h
# 1  3.0  3.0
# 2  7.0  7.0
# 3  5.5  5.5
# 4 14.5 14.5

Upvotes: 0

Related Questions