Reputation: 823
I am trying to create a new column based on few other column names and their values in that data frame.
The data looks something like this
user_id Gender Age Cate_Ch_Bot Cate_Ch_Phy Cate_Ch_Chem Cate_Ch_Comp Cate_Ch_Zoo
0001 F 13 0 1 0 1 0
0002 M 17 1 1 0 0 0
0003 F 13 0 0 0 0 0
0004 F 12 0 0 1 0 0
0005 F 14 0 1 0 0 1
0006 M 16 0 0 0 0 0
I need to create a category column, which will have all the categories that have the value of 1. If the user does not have a category then it be blank or NA.
So the desired output will be:
user_id Gender Age Cate_Ch_Bot Cate_Ch_Phy Cate_Ch_Chem Cate_Ch_Comp Cate_Ch_Zoo Ch_Category
0001 F 13 0 1 0 1 0 Phy:Comp
0002 M 17 1 1 0 0 0 Bot:Phy
0003 F 13 0 0 0 0 0 NA
0004 F 12 0 0 1 0 0 Chem
0005 F 14 0 1 0 0 1 Phy:Zoo
0006 M 16 0 0 0 0 0 NA
I am trying to loop through the column names, but not sure how to do it correctly.
test$category = ""
for (j in 1:dim(test)[1]){
for (i in colnames(test[4:14])){
name = colnames(test[i])
if (test[j,name] == 1){
test$category[j] = paste(test$category[j], colnames(test[i]),sep=":")
}
}
}
I would deeply appreciate any help in this regard.
Upvotes: 0
Views: 2346
Reputation: 18612
How about something like this:
Df <- data.frame(
user_id=1:6,
Gender=rep(c("M","F"),3),
Age=sample(13:17,6,replace=TRUE),
Cate_Ch_Bot=c(0,1,rep(0,4)),
Cate_Ch_Phy=c(1,1,0,0,1,0),
Cate_Ch_Chem=c(0,0,0,1,0,0),
Cate_Ch_Comp=c(1,0,0,0,0,0),
Cate_Ch_Zoo=c(0,0,0,0,1,0),
stringsAsFactors=FALSE)
##
Labs <- gsub("Cate_Ch_","",names(Df)[-c(1:3)])
##
getCols <- function(x)
{
Reduce(function(x,y){paste0(x,":",y)},Labs[which(x==1)])
}
##
Df$new <- apply(Df[,-c(1:3)],1,function(X){
if( is.null(getCols(X)) ){
""
} else {
getCols(X)
}
})
##
> Df2
user_id Gender Age Cate_Ch_Bot Cate_Ch_Phy Cate_Ch_Chem Cate_Ch_Comp Cate_Ch_Zoo new
1 1 M 13 0 1 0 1 0 Phy:Comp
2 2 F 14 1 1 0 0 0 Bot:Phy
3 3 M 16 0 0 0 0 0
4 4 F 14 0 0 1 0 0 Chem
5 5 M 14 0 1 0 0 1 Phy:Zoo
6 6 F 16 0 0 0 0 0
Edited:
I had to wrap the getCols
with an if..else
statement inside of the apply
function because it was actually returning a list
instead of a vector
, with NULL
elements for the rows of Df
where none of the columns had a value of 1
. Previously, it looked like a data.frame
on the surface, but a closer inspection reveals this:
> class(Df)
[1] "data.frame"
> str(Df)
'data.frame': 6 obs. of 9 variables:
$ user_id : int 1 2 3 4 5 6
$ Gender : chr "M" "F" "M" "F" ...
$ Age : int 13 14 16 14 14 16
$ Cate_Ch_Bot : num 0 1 0 0 0 0
$ Cate_Ch_Phy : num 1 1 0 0 1 0
$ Cate_Ch_Chem: num 0 0 0 1 0 0
$ Cate_Ch_Comp: num 1 0 0 0 0 0
$ Cate_Ch_Zoo : num 0 0 0 0 1 0
$ new :List of 6
..$ : chr "Phy:Comp"
..$ : chr "Bot:Phy"
..$ : NULL
..$ : chr "Chem"
..$ : chr "Phy:Zoo"
..$ : NULL
which is undesirable. As for an explanation of what's going on in the solution,
Labs <- gsub("Cate_Ch_","",names(Df)[-c(1:3)])
is just a convenience step, so that there is a vector of ready-made labels to refer to. gsub
is taking the names of the target columns and replacing "Cate_Ch_"
with an empty string ""
, so that the remaining text can be used as a label. getCols
function is structured to operate on a single vector x
- in this case, a single row of Df
. It uses Reduce
to apply a sub-operation (pasting two strings, separated by a :
) in a cumulative manner, where this sub-operation is given in terms of an anonymous function function(x,y){ ... }
. The input we give to function(x,y)
is a subset of our entire Labs
vector - the subset begin only those elements in the row where x==1
. Using which(x==1)
just gives us the indices of the row equal to one. So for row 2 of Df
, which(x==1)
gives the vector c(1,2)
(since Cate_Ch_Bot
and Cate_Ch_Phy
have a value of 1
in row 2). Evaluating this vector of indices inside of Labs
gives you elements 1
and 2
of Labs
- c("Bot","Phy")
. When this is passed to our Reduce(function(x,y) ...
call, it pastes together all of the elements, separated by a :
, and returns a single character value "Bot:Phy"
. If the input to Reduce
had been c("A","B","C","D")
, it would have returned "A:B:C:D"
, and so on. apply
is used to perform the operation over multiple rows. As I noted above, I had to make a slight change to my original call to apply
to make sure that it returned a vector
and not a list
. And after all that typing... @Richard Scriven points out a better answer:
> apply(Df[-(1:3)] == 1, 1, function(x) {
paste(gsub(".*_", "", names(which(x))), collapse = ":")
})
[1] "Phy:Comp" "Bot:Phy" "" "Chem" "Phy:Zoo" ""
Upvotes: 3