Reputation: 303
I have a data.frame named factor_nonagg with 50 rows and 3 columns. I wrote a function category() with argument factors. I am making changes to factors in the function. When I pass the data.frame to this function, no changes are made in the data.frame. Can someone help me in making permanent changes to my data.frame?
n=50
category=function(factors){
for(i in 1:n){
if(factors[i,1]>=90) factors[i,1]<-2*.45
else if(factors[i,1]>=65) factors[i,1]<-1*.45
else factors[i,1]<-0
if(factors[i,2]>=.190) factors[i,2]<-2*.25
else if(factors[i,2]>=.140) factors[i,2]<-1*.25
else factors[i,2]<-0
if(factors[i,3]>=.03) factors[i,3]<-2*.30
else if(factors[i,3]>=.015) factors[i,3]<-1*.30
else factors[i,3]<-0
}}
category(factor_nonagg)
Upvotes: 1
Views: 122
Reputation: 263481
Looping through rows of dataframes is going to be painfully slow. This is a vectorized approach that is admittedly untested in the absence of data but does not throw an error with the other test data offered by dardisco:
category=function(factors){
factors[[1]] <- 0.45*(0:2)[ findInterval(factors[[1]], c(-Inf, 65, 90, Inf) )]
factors[[2]] <- 0.25*(0:2)[ findInterval(factors[[2]], c(-Inf, 0.140, 0.190, Inf) )]
factors[[3]] <- 0.30*(0:2)[ findInterval(factors[[3]], c(-Inf, 0.015, 0.03, Inf) )]
return(factors) }
And, of course, as with all functional languages, factor_agg would not be modified except with an assignment command:
category(factor_agg) # no effect
factor_agg <- category(factor_agg) # replacement occurs.
findInterval
is a very useful vector-oriented function that can either be used to return a grouping value or used, as in this example, as an index to select from a set of either character or numeric values
Upvotes: 1
Reputation: 5274
You could approach it like this:
set.seed(1)
df1 <- data.frame(
f1 = sample(seq(150), size=50, replace=TRUE),
f2 = sample(seq(250) / 1000, size=50, replace=TRUE),
f3 = sample(seq(50) / 1000, size=50, replace=TRUE)
)
### vals1 = values
### mult1 = multiplier
fun1 <- function(x, vals1, mult1){
if (x >= max(vals1)) return(mult1*2)
if (x >= min(vals1) & x < max(vals1)) return(mult1)
return(0)
}
within(df1,
f1 <- sapply(f1, fun1, vals1=c(90, 65), mult1=0.45),
f2 <- sapply(f2, fun1, vals1=c(0.19, 0.14), mult1=0.25),
f3 <- sapply(f3, fun1, vals1=c(0.03, 0.15), mult1=0.3)
)
This avoids the for
(although short loops are not necessarily a bad thing), saves on typing and allows it to be more easily generalized if you want to change the values or multiplier. I'm using return
in fun1
as it has multiple exit points.
Upvotes: 0
Reputation: 206566
R does not easily support pass-by-reference type behavior with functions. When you make a change to a parameter value within a function, a copy of the object is made and the changes last only as long as the function call.
Typically you have your function return the changed value (return(factor)
), and assign that new value to the original variable:
factor_nonagg <- category(factor_nonagg)
Upvotes: 1
Reputation: 490
You need to set an output object in your function that returns the changes you make to your df. This is achieved by adding
return(factors)
just before your last curly bracket in your function definition.
Upvotes: 0