Alex V
Alex V

Reputation: 3

If statement in user defined function within apply in R

Forgive me if this is a blatantly obvious question, I am a beginner R user eager to learn.

I have a data frame of 4 columns with roughly 1.5 million rows containing coordinate information where each individual row represents a specific location. What I would like to do is run these data into a function that holds a series of if else statements that determine the area of the specific location within a larger box. For example, a point can be in the center, along the edge of the box within 1.5 inches, on the inside of the box but not on the edge nor at the center, or on the outside of the box.

Each if statement determines if a set of points is in a specified area, and, if it is, the result is the if statement putting a '1' in the corresponding row of another data frame.

Here is a visualization of what I am trying to do:

Take this location data from a data frame called 'dimensions':

 sz_top | sz_bot |     px |   pz  |   
  3.526 |   1.615|  -1.165| 3.748 |

Run it through these statements (the real statements are much longer), where the 'else' condition means the point is outside the box completely:

if(in center) else if(on edge) else if(in box, but not in center or on edge) else

When the program finds which condition is true, it puts a 1 in ANOTHER data frame called 'call' in the corresponding column (these columns are columns 50-53). This is what the row would look like in the event the code found the point was in the center:

center| edge| other_in| out| 
  1   |  0  |       0 |   0|

One thing to note that could improve efficiency is that the coordinates are actually also contained in the 'calls' data frame in columns 22,23,26, and 27, but I moved them to 'dimensions' because it was easier for me to work with. This can definitely be changed.

I am now very unclear on how to proceed from here. I have all my if else statement written, but I am unclear on how my program will know which row it is on as to correctly mark the corresponding row with the result of the tests.

Please let me know if you would like any more information from me.

Thanks!

EDIT:

Here is a sample of the 'dimensions' data frame:

sz_top  sz_bot  px  pz
1   3.526   1.615   -1.165  3.748
2   3.29    1.647   -0.412  1.9
3   3.29    1.647   -1.213  1.352
4   3.565   1.75    -1.041  2.419
5   3.565   1.75    -0.357  1.776
6   3.565   1.75    0.838   0.834
7   3.541   1.724   -1.619  3.661
8   3.541   1.724   -2.498  2.421
9   3.541   1.724   -1.673  2.348
10  3.541   1.724   -1.572  2.982
11  3.305   1.5 -1.316  2.842

Here is an example of one of my if statements. The others are fairly similar, just looking at different locations around the box in question:

  if(
    ((as.numeric(as.character(dimensions$px))*12)>= -3)
    &&
      ((as.numeric(as.character(dimensions$px))*12)<= 3)
    &&
      ((as.numeric(as.character(dimensions$pz))*12)<=((as.numeric(as.character(dimensions$sz_top))*12-as.numeric(as.character(dimensions$sz_bot))*12)/2)+(as.numeric(as.character(dimensions$sz_bot))*12)+3)
    &&
      ((as.numeric(as.character(dimensions$pz))*12)>=((as.numeric(as.character(dimensions$sz_top))*12-as.numeric(as.character(dimensions$sz_bot))*12)/2)+(as.numeric(as.character(dimensions$sz_bot))*12)-3)
  ){return(1)
  } 

Upvotes: 0

Views: 1871

Answers (2)

cryo111
cryo111

Reputation: 4474

I would proceed as follows (I have slightly changed your example):

First preallocate an empty call dataframe.

call=data.frame(matrix(NA,nrow=nrow(dimensions),ncol=4))
colnames(call)=paste("Q",1:4,sep="")

Use with to be able to access the column names in dimensions by just pxand py. Therefore, the code is easier to read.

with(dimensions,{
call$Q1<<-ifelse(px>0&pz>0,1,0);
call$Q2<<-ifelse(px<0&pz>0,1,0);
call$Q3<<-ifelse(px<0&pz<0,1,0);
call$Q4<<-ifelse(px>0&pz<0,1,0);})

Please note the <<- instead of <-. This special operator has to be used since call is not in the environment where {call$Q1<<-ifelse ...} is evaluated. With <<-, parent environments are searched as well.

BTW: If performance is an issue, have a look at the data.tablepackage.

Upvotes: 0

jbaums
jbaums

Reputation: 27388

If I understand correctly, the following will return a numeric vector of ones and zeros that you can slot into the appropriate column of calls.

dimensions <- read.table(text='sz_top  sz_bot  px  pz
1   3.526   1.615   -1.165  3.748
2   3.29    1.647   -0.412  1.9
3   3.29    1.647   -1.213  1.352
4   3.565   1.75    -1.041  2.419
5   3.565   1.75    -0.357  1.776
6   3.565   1.75    0.838   0.834
7   3.541   1.724   -1.619  3.661
8   3.541   1.724   -2.498  2.421
9   3.541   1.724   -1.673  2.348
10  3.541   1.724   -1.572  2.982
11  3.305   1.5 -1.316  2.842', header=T, row.names=1)


as.numeric(
  dimensions$px*12 >= -3
  & dimensions$px*12 <= 3
  & dimensions$pz*12 <= 
    (dimensions$sz_top*12 - dimensions$sz_bot*12)/2 + (dimensions$sz_bot*12) + 3
  & dimensions$pz*12 >= 
    (dimensions$sz_top*12 - dimensions$sz_bot*12)/2 + (dimensions$sz_bot*12) - 3)

By using single ampersands, R evaluates the conditional expression for each row of the data.frame, rather than stopping when the condition is first not met.

I've removed as.numeric and as.character for clarity (not sure why these are necessary anyway... were these data read in as factors? If so, perhaps try stringsAsFactors = FALSE).

Upvotes: 1

Related Questions