Reputation: 45
I have data.frame
with column names start with prefix of X and series of numbers. For example,
col<-c("X1.1","X1.2","X1.3","X1.4","X1.5","X2.1","X2.2","X2.3","X2.4","X2.5","X3.1","X3.2","X3.3","X3.4","X3.5")
m<-matrix(sample(1:15),ncol=15,nrow=5)
mf<-data.frame(m)
colnames(mf)<-col
Then I want to find the max values for each row within prefix of X1 (total four columns), X2 (four columns), X3 (four columns)...and return the column number (subsequent number after the X prefix) for the max value
So my expected output is
X1 X2 X3 X4
1 4 2 4 ...
...
Can anyone help me on this? And if there's two max values then want to return two column names as well...
I searched that which
should be used.. but not sure.
Upvotes: 1
Views: 1277
Reputation: 55390
To reshape your data use the following:
library(reshape2)
mf.melted <- melt(data=mf)
mf.melted$group <- unlist(gsub("\\.\\d+$", "", as.character(mf.melted$variable)))
mf.melted
unlist(gsub("\\.\\d+$", "", as.character(mf.melted$variable)))
## Original column names are now stored as column `'variable'` in `mf.melted`
mf.melted$variable
## Notice it is a `factor` column. So needs to be converted to string. This is done with:
as.character( __ )
## Next we remove the `.3` (or whatever number) from each.
## the regex expression '\\.\\d+$' looks for
`\\.` # a period
`\\d` # a digit
'\\d+' # at least one digit
`$` # at the end of a word
## gsub finds the first pattern and replaces it with the second
## in this case an empty string
gsub("\\.\\d+$", "", __ )
## We then assign the results back into a new column, namely `'group'`
mf.melted$group <- __
Now, with your melted data.frame, you can easily search and aggregate by column group
head(mf.melted)
variable value group
1 X1.1 3 X1
2 X1.1 4 X1
3 X1.1 12 X1
4 X1.1 14 X1
5 X1.1 7 X1
6 X1.2 6 X1
Upvotes: 2
Reputation: 2526
Recreate example data (please use reproduce
or dput
in the future):
df = data.frame(matrix(rep(NA,12*3),nrow=3))
colnames(df) = strsplit("X1.1 X1.2 X.3 X.4 X2.1 X2.2 X2.3 X2.4 X3.1 X3.2 X3.3 X3.4",split=" ")[[1]]
sapply(colnames(df), function(x) { df[[x]] <<- sample(1:10,3) } )
Get the different kinds of colnames:
xTypes = unique(sapply(colnames(df), function(x) { strsplit(x,"\\.")[[1]][1] } ))
Get the max per colname kind:
result = sapply(xTypes,function(x) { max(df[,grep(paste(x,"\\.",sep=""),colnames(df))]) })
> sapply(xTypes,function(x) { max(df[,grep(paste(x,"\\.",sep=""),colnames(df))]) })
X1 X X2 X3
9 9 10 9
If you want the column index of the maximum within each colname kind:
result = sapply(xTypes,function(x) { which.max(apply(df[,grep(paste(x,"\\.",sep=""),colnames(df))],2,max)) })
names(result) = xTypes
Now the result is:
X1 X X2 X3
1 1 2 1
Upvotes: 3