Ved Gupta
Ved Gupta

Reputation: 303

Variable scope in R

I have a data.frame named factor_nonagg with 50 rows and 3 columns. I wrote a function category() with argument factors. I am making changes to factors in the function. When I pass the data.frame to this function, no changes are made in the data.frame. Can someone help me in making permanent changes to my data.frame?

n=50
category=function(factors){
for(i in 1:n){
if(factors[i,1]>=90) factors[i,1]<-2*.45
else if(factors[i,1]>=65) factors[i,1]<-1*.45
else factors[i,1]<-0

if(factors[i,2]>=.190) factors[i,2]<-2*.25
else if(factors[i,2]>=.140) factors[i,2]<-1*.25
else factors[i,2]<-0

if(factors[i,3]>=.03) factors[i,3]<-2*.30
else if(factors[i,3]>=.015) factors[i,3]<-1*.30
else factors[i,3]<-0
}}
category(factor_nonagg)

Upvotes: 1

Views: 122

Answers (4)

IRTFM
IRTFM

Reputation: 263481

Looping through rows of dataframes is going to be painfully slow. This is a vectorized approach that is admittedly untested in the absence of data but does not throw an error with the other test data offered by dardisco:

category=function(factors){
factors[[1]] <- 0.45*(0:2)[ findInterval(factors[[1]], c(-Inf, 65, 90, Inf) )]
factors[[2]] <- 0.25*(0:2)[ findInterval(factors[[2]], c(-Inf, 0.140, 0.190, Inf) )]
factors[[3]] <- 0.30*(0:2)[ findInterval(factors[[3]], c(-Inf, 0.015, 0.03, Inf) )]
   return(factors) }

And, of course, as with all functional languages, factor_agg would not be modified except with an assignment command:

category(factor_agg)                # no effect
factor_agg <- category(factor_agg)  # replacement occurs.

findInterval is a very useful vector-oriented function that can either be used to return a grouping value or used, as in this example, as an index to select from a set of either character or numeric values

Upvotes: 1

dardisco
dardisco

Reputation: 5274

You could approach it like this:

set.seed(1)
df1 <- data.frame(
    f1 = sample(seq(150), size=50, replace=TRUE),
    f2 = sample(seq(250) / 1000, size=50, replace=TRUE),
    f3 = sample(seq(50) / 1000, size=50, replace=TRUE)
    )
### vals1 = values
### mult1 = multiplier
fun1 <- function(x, vals1, mult1){
    if (x >= max(vals1)) return(mult1*2)
    if (x >= min(vals1) & x < max(vals1)) return(mult1)
    return(0)
    }
within(df1,
       f1 <- sapply(f1, fun1, vals1=c(90, 65), mult1=0.45),
       f2 <- sapply(f2, fun1, vals1=c(0.19, 0.14), mult1=0.25),
       f3 <- sapply(f3, fun1, vals1=c(0.03, 0.15), mult1=0.3)
       )

This avoids the for (although short loops are not necessarily a bad thing), saves on typing and allows it to be more easily generalized if you want to change the values or multiplier. I'm using return in fun1 as it has multiple exit points.

Upvotes: 0

MrFlick
MrFlick

Reputation: 206566

R does not easily support pass-by-reference type behavior with functions. When you make a change to a parameter value within a function, a copy of the object is made and the changes last only as long as the function call.

Typically you have your function return the changed value (return(factor)), and assign that new value to the original variable:

factor_nonagg <- category(factor_nonagg)

Upvotes: 1

abel
abel

Reputation: 490

You need to set an output object in your function that returns the changes you make to your df. This is achieved by adding

return(factors)

just before your last curly bracket in your function definition.

Upvotes: 0

Related Questions