Reputation: 3
I am new to R (Economist with background in Stata) and I am having trouble getting a nested for loop to work for me. I know the issue is that I don't have a good understanding of how to use the loop counter as part of a variable name.
A bit of background. I have data frame with data on average rental rates for homes of different size (1 bedroom, 2 bedroom, etc) and data on annual earnings (mean, median, and various percentiles). I am trying to generate a series of new columns containing the ratio of these two things (rental rate / mean earnings).
Specifically my variables are:
beds1, beds2, beds3, beds4
mean, median, p10, p25, p75, p90
So you see I need to generate 24 new columns of cost/earnings data. I could write out 24 lines of code but I don't want to. More importantly, I want to learn an efficient way of doing this in R. In Stata I could do this very simply using a nested for loop, but I can't get it to work in R. Here is my code so far.
for (i in 1:4) {
stat <- c("median", "mean", "p10", "p25", "p75","p90")
for (x in stat) {
df$beds[i]_[x] <- round((df$beds[i]/df$[x]),digits=3)
}
}
When I run this code the error I get is
Error: unexpected input in:
" for (x in stat) {
df$beds[i]_"
> }
Error: unexpected '}' in " }"
> }
Error: unexpected '}' in "}"
I have tried to use the double brackets [[]]
but that didn't change the results. If anyone has some insight into why the dynamic variables names aren't working please let me know. Even better, since I guess loops are evil in R, if anyone knows a way to use lapply
to get this done, I would love to hear that too.
EDIT
Thanks @Spacedman for the comment. I think I am getting what you're saying. So does that mean that there simply isn't anyway to do what I want to do in R?
var1 <- c("beds1", "beds2")
var2 <- c("mean", "median")
for (i in 1:2) {
for (j in 1:2) {
df$var1[i]_var2[j] <- df$var1[i]/df$var2[j]
}
}
I think this should grab the elements of the lists var1
and var2
so that when i=1
and j=1
, df$var1[i]/df$var2[j]
should mean df$beds1/df$mean
. Or would R get mad and think I was trying to divide strings?
FINAL EDIT WITH ANSWER FROM @SPACEEMAN
Thanks @Spacedman. I loved your spoiler and thank you for providing additional help. I didn't fully grasp the difference between the two ways of referring to columns after your last post, but I think I have a better idea now. I did a bit of tweaking and now I have something that works perfectly. Thanks again!
beds <- c("beds1", "beds2", "beds3", "beds4")
stat <- c("median", "mean", "p10", "p25", "p75","p90")
for(i in beds){
for(x in stat){
res = paste0(i,"_",x)
df[[res]]=round(df[[i]]/df[[x]],digits=3)
}
}
Upvotes: 0
Views: 123
Reputation: 94172
R is not a macro expansion language like other languages you might be used to.
x[i]
, if i=123
, does not "expand" into x123
. It gets the value of the 123rd element of the vector, x
.
So df$beds[i]
tries to get the i'th element of a vector df$beds
.
You need to know two things:
For this you can use paste0
:
> for(i in 1:4){
+ print(paste0("beds",i))
+ }
[1] "beds1"
[1] "beds2"
[1] "beds3"
[1] "beds4"
For this you can use double square brackets. In a list:
> z = list()
> n = "thing"
Double squabs evaluate their index and use that. So:
> z[[n]] = 99
Will set z$thing
, but dollar sign indexing is literal, so:
> z$n = 123
will set z$n
:
> z
$thing
[1] 99
$n
[1] 123
hopefully that's enough hints to get you through. It should all be covered in basic R tutorials online.
If you want to work out how to do it yourself, look away now...
First, lets create a sample data frame - you should include something like this in your question so we have common test data to work on. I'll just have three beds and two stats:
> df = data.frame(
beds1=c(1,2,3),
beds2=c(5,2,3),
beds3=c(6,6,6),
mean=c(8,4,3),
median=c(1,7,4))
> df
beds1 beds2 beds3 mean median
1 1 5 6 8 1
2 2 2 6 4 7
3 3 3 6 3 4
Now the work. We loop over the bed number and the character stats. The bed column name is stored in bed
by pasting "beds" to the number i
. We compute the name of the result column (res
) for a given bed number and stat by pasting "beds"
to i
and "_"
and the name of the stat in x
.
Then set the new resulting column to the value by dividing the beds number by the stat. We use [[z]]
to get the columns by name:
> for(i in 1:3){
stats=c("mean","median")
for(x in stats){
bed = paste0("beds",i)
res = paste0("beds",i,"_",x)
df[[res]]=round(df[[bed]]/df[[x]],digits=3)
}
}
Resulting in....
> df
beds1 beds2 beds3 mean median beds1_mean beds1_median beds2_mean beds2_median
1 1 5 6 8 1 0.125 1.000 0.625 5.000
2 2 2 6 4 7 0.500 0.286 0.500 0.286
3 3 3 6 3 4 1.000 0.750 1.000 0.750
beds3_mean beds3_median
1 0.75 6.000
2 1.50 0.857
3 2.00 1.500
>
Upvotes: 2