lists, data.frame loops & indexing questions

Question

I have some python and numpy experience but have never used R before. I'm trying to help my wife with her R project since although she has a much better grasp on statistics, she has little programming experience. I'm finding the syntax and documentation of R very confusing.

The original thing We wanted to do was loop through a large data.frame, do a bunch of spacial calculations involving prior and subsequent records, a little trig and some quality checks on the data and generate a new object with the data. We then will get this new data into GIS

EDIT: Just to be clear, the calculations in this example are just a placeholder, and are nothing like the actual calculations I needed to do.

Initially I tried something like this:

> result = list()
> for (i in 1:5) {
+   #Calculate some dummy data. The actual calculations are much more involved
+   param1 = i * 1.1
+   param2 = i * 5.3
+   param3 = i + a_value
+   # Now append these calculated values to some sort of object
+   sample = list(param1=param1,param2=param2,param3=param3)
+   result <- rbind(result,sample)
+ }
> print(result)
       param1 param2 param3
sample 1.1    5.3    12    
sample 2.2    10.6   13    
sample 3.3    15.9   14    
sample 4.4    21.2   15    
sample 5.5    26.5   16

The "sample" column seems un-necessary, but oh well, it looks good. Now to reference a single column...

> result$param2
NULL

???I tried getting rid of 'sample' by:

+   result <- rbind(result,list(param1=param1,param2=param2,param3=param3))
>
     param1 param2 param3
[1,] 1.1    5.3    12    
[2,] 2.2    10.6   13    
[3,] 3.3    15.9   14    
[4,] 4.4    21.2   15    
[5,] 5.5    26.5   16 
> result$param2
NULL

Perhaps this data frame thing will work. I changed the first line to:

result = data.frame()
>
   param1 param2 param3
2     1.1    5.3     12
21    2.2   10.6     13
3     3.3   15.9     14
4     4.4   21.2     15
5     5.5   26.5     16
> result$param2 # One column
[1]  5.3 10.6 15.9 21.2 26.5
> result[2,] #One row
   param1 param2 param3
21    2.2   10.6     13
> result[3,]$param3 # Single value
[1] 14

So it's working, but I'm not sure what the 21 (row number?) is all about. If I have more rows, the 21st row is '211'.

Could someone tell me why the first case didn't work, what the '21' is all about, and if there is a better way to do this. Much of what I've read indicates that loops in R are a sign you don't know what you are doing, but the learning curve on the alternatives seems steep. This is also why the script takes an amazingly long time to run, even on a fast machine.

Tyler Rinker · Accepted Answer

The problem is that R works very differently than other programming languages. It generally is not very fast to use a loop. Instead use the vectorization that makes R easy to work with (but different than other languages). So for your problem I'd probably do:

i=1:5
data.frame(param1 = i * 1.1, param2 = i * 5.3, param3 = i*2+9)

Also check out apply, lapply, sapply, ifelse, etc. Also note that many functions are vectorized and work readily on vectors.

If you really wanted to fix up what you have you could use the following:

 result = list()
 for (i in 1:5) {
   #Calculate some dummy data. The actual calculations are much more involved
   param1 = i * 1.1
   param2 = i * 5.3
   param3 = 2*i+9
   # Now append these calculated values to some sort of object
   sample = list(param1=param1,param2=param2,param3=param3)
   result <- data.frame(rbind(result,sample))
   rownames(result) <- 1:nrow(result)
 }
 print(result)

lists, data.frame loops & indexing questions

Answers (2)

Related Questions

lists, data.frame loops &amp; indexing questions

Answers (2)

Related Questions

lists, data.frame loops & indexing questions