JMac
JMac

Reputation: 3

Creating new columns using a for loop

I am new to R (Economist with background in Stata) and I am having trouble getting a nested for loop to work for me. I know the issue is that I don't have a good understanding of how to use the loop counter as part of a variable name.

A bit of background. I have data frame with data on average rental rates for homes of different size (1 bedroom, 2 bedroom, etc) and data on annual earnings (mean, median, and various percentiles). I am trying to generate a series of new columns containing the ratio of these two things (rental rate / mean earnings).

Specifically my variables are:

So you see I need to generate 24 new columns of cost/earnings data. I could write out 24 lines of code but I don't want to. More importantly, I want to learn an efficient way of doing this in R. In Stata I could do this very simply using a nested for loop, but I can't get it to work in R. Here is my code so far.

for (i in 1:4) {
    stat <- c("median", "mean", "p10", "p25", "p75","p90")
    for (x in stat) {
        df$beds[i]_[x] <- round((df$beds[i]/df$[x]),digits=3)
    }
}

When I run this code the error I get is

Error: unexpected input in:
"    for (x in stat) {
    df$beds[i]_"
>     }
Error: unexpected '}' in "    }"
> }
Error: unexpected '}' in "}"

I have tried to use the double brackets [[]] but that didn't change the results. If anyone has some insight into why the dynamic variables names aren't working please let me know. Even better, since I guess loops are evil in R, if anyone knows a way to use lapply to get this done, I would love to hear that too.


EDIT

Thanks @Spacedman for the comment. I think I am getting what you're saying. So does that mean that there simply isn't anyway to do what I want to do in R?

var1 <- c("beds1", "beds2")
var2 <- c("mean", "median")

for (i in 1:2) {
    for (j in 1:2) {
        df$var1[i]_var2[j] <- df$var1[i]/df$var2[j]
    }
}

I think this should grab the elements of the lists var1 and var2 so that when i=1 and j=1, df$var1[i]/df$var2[j] should mean df$beds1/df$mean. Or would R get mad and think I was trying to divide strings?


FINAL EDIT WITH ANSWER FROM @SPACEEMAN

Thanks @Spacedman. I loved your spoiler and thank you for providing additional help. I didn't fully grasp the difference between the two ways of referring to columns after your last post, but I think I have a better idea now. I did a bit of tweaking and now I have something that works perfectly. Thanks again!

beds <- c("beds1", "beds2", "beds3", "beds4")
stat <- c("median", "mean", "p10", "p25", "p75","p90")

for(i in beds){
    for(x in stat){
        res = paste0(i,"_",x)
        df[[res]]=round(df[[i]]/df[[x]],digits=3)
    }
}

Upvotes: 0

Views: 123

Answers (1)

Spacedman
Spacedman

Reputation: 94172

R is not a macro expansion language like other languages you might be used to.

x[i], if i=123, does not "expand" into x123. It gets the value of the 123rd element of the vector, x.

So df$beds[i] tries to get the i'th element of a vector df$beds.

You need to know two things:

  1. How to construct strings from other strings.

For this you can use paste0:

> for(i in 1:4){
+  print(paste0("beds",i))
+ }
[1] "beds1"
[1] "beds2"
[1] "beds3"
[1] "beds4"
  1. How to access columns by names.

For this you can use double square brackets. In a list:

> z = list()
> n = "thing"

Double squabs evaluate their index and use that. So:

> z[[n]] = 99

Will set z$thing, but dollar sign indexing is literal, so:

> z$n = 123

will set z$n:

> z
$thing
[1] 99

$n
[1] 123

hopefully that's enough hints to get you through. It should all be covered in basic R tutorials online.

Spoiler

If you want to work out how to do it yourself, look away now...

First, lets create a sample data frame - you should include something like this in your question so we have common test data to work on. I'll just have three beds and two stats:

> df = data.frame(
     beds1=c(1,2,3),
     beds2=c(5,2,3),
     beds3=c(6,6,6),
     mean=c(8,4,3),
     median=c(1,7,4))
> df
      beds1 beds2 beds3 mean median
    1     1     5     6    8      1
    2     2     2     6    4      7
    3     3     3     6    3      4

Now the work. We loop over the bed number and the character stats. The bed column name is stored in bed by pasting "beds" to the number i. We compute the name of the result column (res) for a given bed number and stat by pasting "beds" to i and "_" and the name of the stat in x.

Then set the new resulting column to the value by dividing the beds number by the stat. We use [[z]] to get the columns by name:

> for(i in 1:3){
  stats=c("mean","median")
  for(x in stats){
    bed = paste0("beds",i)
    res = paste0("beds",i,"_",x)
    df[[res]]=round(df[[bed]]/df[[x]],digits=3)
  }
 }

Resulting in....

> df
  beds1 beds2 beds3 mean median beds1_mean beds1_median beds2_mean beds2_median
1     1     5     6    8      1      0.125        1.000      0.625        5.000
2     2     2     6    4      7      0.500        0.286      0.500        0.286
3     3     3     6    3      4      1.000        0.750      1.000        0.750
  beds3_mean beds3_median
1       0.75        6.000
2       1.50        0.857
3       2.00        1.500
> 

Upvotes: 2

Related Questions