Reputation: 544
i have a very sparse data set - below is a example of the format. I want to make changes to specific columns based on the logic explained below
# create dummy data set
pb=c('1','0','0','0','0','1','Not_ans','1','0','Not_ans')
qa=c('1','1','0','0','1','0','Not_ans','1','Not_ans','Not_ans')
#zy=c('1','Not_ans','0','1','Not_ans','0','1','1','1','Not_ans')
#sub questions for pb
pb.abr=c('1','0','0','0','0','1','0','1','0','0')
pb.ras=c('0','0','0','0','1','0','0','1','0','0')
pb.sfg=c('1','0','0','0','0','0','0','1','0','0')
#sub questions for qa
qa.fgs=c('1','0','0','0','0','0','0','1','0','0')
qa.sdf=c('0','1','0','0','0','0','0','0','0','0')
qa.tyu=c('0','0','0','0','1','0','0','1','0','0')
df=data.frame(pb,qa,pb.abr,pb.ras,pb.sfg,qa.fgs,qa.sdf,qa.tyu)
df
pb qa pb.abr pb.ras pb.sfg qa.fgs qa.sdf qa.tyu
1 1 1 1 0 1 1 0 0
2 0 1 0 0 0 0 1 0
3 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0
5 0 1 0 1 0 0 0 1
6 1 0 1 0 0 0 0 0
7 Not_ans Not_ans 0 0 0 0 0 0
8 1 1 1 1 1 1 0 1
9 0 Not_ans 0 0 0 0 0 0
10 Not_ans Not_ans 0 0 0 0 0 0
The two columns pb and qa are called base columns, and they have further sub columns for with naming convention as pb. and qa. - so we see three sub columns for pa and 3 for qa. I want to make changes to these sub columns based on a condition to the base column ( pa or qa) .
Condition is if column pb =='Not_ans'
then make all sub columns (pb.abr,pb.ras and pb.sfg) = 'Not_applicable'
how do i write a function which achieves this? where i specify the base column name i.e. pb
and naming of sub columns example 'pb.'
below - would it be something like below but it wont give the result
data.frame(ifelse(df['base_q']=='Not_ans',
df[ , grepl( paste('base_q','.') , names(df) )]=='Not_applicable',df[,grepl(
paste('base_q','.') , names(df)) ])
How do i write a generic function which takes the base column numbers as inputs for example 1,2 here - applies the function i.e whereever pb is Not_ans it changes sub_columns ( pb.abr,pb.ras,pb.sfg) to Not applicable and then moves to column 2 ( qa) and applies the same logic?
Upvotes: 1
Views: 86
Reputation: 544
Based on the answer given by @Wen-Ben - the following code worked -
yf=function(df,v,y){
for(i in v:y) {
df[df[i]=='Not_ans',][,names(df)[substr(names(df),1,nchar(colnames(df)[i])+1)==paste0(colnames(df)[i],'.')]]='Not_applicable'
}
return(df)
}
Upvotes: 0
Reputation: 23574
One way would be the following. You can specify which columns you want to apply a function (or functions) in var
in mutate_at()
. Here I used contains()
to specify the column names. Then, I replaced the numeric values in the columns when pb == "Not_ans" with "Not_applicable".
mutate_at(df,
vars(contains("pb.")),
.funs = funs(ifelse(pb == "Not_ans",
"Not_applicable",
.)))
# pb qa pb.abr pb.ras pb.sfg qa.fgs qa.sdf qa.tyu
#1 1 1 2 1 2 1 0 0
#2 0 1 1 1 1 0 1 0
#3 0 0 1 1 1 0 0 0
#4 0 0 1 1 1 0 0 0
#5 0 1 1 2 1 0 0 1
#6 1 0 2 1 1 0 0 0
#7 Not_ans Not_ans Not_applicable Not_applicable Not_applicable 0 0 0
#8 1 1 2 2 2 1 0 1
#9 0 Not_ans 1 1 1 0 0 0
#10 Not_ans Not_ans Not_applicable Not_applicable Not_applicable 0 0 0
If you want to apply the same task for both pb
and qa
, you can use mutate_at()
twice.
mutate_at(df,
vars(contains("pb.")),
.funs = funs(ifelse(pb == "Not_ans",
"Not_applicable",
.))) %>%
mutate_at(vars(contains("qa.")),
.funs = funs(ifelse(qa == "Not_ans", "Not_applicable",.)))
Upvotes: 0
Reputation: 323226
You can do with
yf=function(df,v){
df[df[v]=='Not_ans',][,names(df)[substr(names(df),1,nchar(v)+1)==paste0(v,'.')]]='Not_applicable'
return(df)
}
yf(df,'pb')
pb qa pb.abr pb.ras pb.sfg qa.fgs qa.sdf qa.tyu
1 1 1 1 0 1 1 0 0
2 0 1 0 0 0 0 1 0
3 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0
5 0 1 0 1 0 0 0 1
6 1 0 1 0 0 0 0 0
7 Not_ans Not_ans Not_applicable Not_applicable Not_applicable 0 0 0
8 1 1 1 1 1 1 0 1
9 0 Not_ans 0 0 0 0 0 0
10 Not_ans Not_ans Not_applicable Not_applicable Not_applicable 0 0 0
Data input
df=data.frame(pb,qa,pb.abr,pb.ras,pb.sfg,qa.fgs,qa.sdf,qa.tyu,stringsAsFactors = F)
# notice stringsAsFactors
Upvotes: 2