Reputation: 419
I have a specific programming question concerning R. I want to apply a custom function over a whole data set, but the values in the function should change compared to what group it belongs to. Here is a dataset that is similar to the one i'm working with
set.seed(123)
df <- data.frame(group = c(rep("one", 10), rep("two", 9), rep("three", 11)),
slot = c(1:10, 1:9, 1:11),
x = sample(100, 30))
And the function
RI_fun <- function(x, y) {
((x - y)/ y) * 100
}
The real dataset is larger but the structure is the same. A little info on the real dataset: It is a series of measurements(slots) on a sample (group) where i want the first measurement (slot == 1) to be y in the custom function (RI_fun)
i want to make a new column that is the output of the custom function where x = df$x and y is the x value where df$slot == 1 to each group.
I have tried to make a for loop, but without success. My idea was to make the y value an if else statement where it checked for df$group and applied df$x where slot == 1 and group == group that has just been checked.
Here is my attemp:
for (i in seq_along(df$group)) {
RI[i] = RI_fun(x = df$x[i],
y = (ifelse(df$group == df$group[i],
df$x[df$slot == 1 & df$group == df$group[i]],
NA)))
However the output is:
[1] 0.00000 172.41379 41.37931 196.55172 213.79310 -82.75862 72.41379 186.20690 75.86207 44.82759 NA
[12] NA NA NA NA NA NA NA NA NA NA NA
[23] NA NA NA NA NA NA NA NA
When i manually checked what the output should be, it showed that the for-loop is correct up to [11] where it doesn't work anymore. I've tried some other for-loops that are similar to this one, but this is the one where i got closest to the desired output.
Any help you got would be appreciated. If i wasn't clear enough, please ask and i'll try to make it more clear.
Upvotes: 1
Views: 429
Reputation: 5776
Great question, and nicely formatted with reproducible example! Kudos!
In R, you generally don't need to bother using loops. R is inherently vectorized, so we can express ourselves in terms of vectors. Moving along to data.frames, the idea is the same, and adding the package dplyr
, we get some easy functionality.
First, I demonstrate what you want:
library(dplyr)
df %>% group_by(group) %>%
mutate(y=x[slot==1])
as.data.frame(.Last.value)
group slot x y
1 one 1 30 30
2 one 2 72 30
3 one 3 88 30
4 one 4 5 30
5 one 5 55 30
6 one 6 42 30
7 one 7 11 30
8 one 8 53 30
9 one 9 73 30
10 one 10 87 30
11 two 1 52 52
12 two 2 82 52
13 two 3 78 52
14 two 4 59 52
15 two 5 12 52
16 two 6 95 52
17 two 7 1 52
18 two 8 70 52
19 two 9 66 52
20 three 1 69 69
21 three 2 79 69
22 three 3 80 69
23 three 4 21 69
24 three 5 94 69
25 three 6 75 69
26 three 7 25 69
27 three 8 15 69
28 three 9 74 69
29 three 10 31 69
30 three 11 43 69
So, we can confirm that we get the correct x
and y
values. Try to remove the line group_by
and see what happens.
Satisfied that we are obtaining the relevant x
and y
values, plug in your function:
df %>% group_by(group) %>%
mutate(RI=RI_fun(x, x[slot==1]))
If you did try to remove the group_by
line, you got an error. That's because mutate
wants to use a value, either 1 for the entire vector (column) or a value per element in the column. So what happens if you have multiple slots == 1 per group? Well, you'll have to decide how to deal with the deviation from your requirements.
EDIT:
The reason your for-loop isn't working as expected is due to the ifelse
at y. Simply replace with
for (i in seq_along(df$group)) {
RI[i] = RI_fun(x = df$x[i],
y = df$x[df$slot == 1 & df$group == df$group[i]])
}
and it should work just fine.
This is due to ifelse
is vectorized; for each element in the test (first) argument (df$group == df$group[i]
) it returns the corresponding element in either the yes (second) or no (third) element.
Upvotes: 1
Reputation: 1094
The problem is with the ifelse statement. When you call ifelse(df$group == df$group[i] ...), it returns false as soon as df$group[i] != df$group[1]; it returns the result of the very first comparison. You don't need the ifelse, as far as I can see. The following code worked for me (though you should do the manual check to make sure that it's correct).
df <- data.frame(group = c(rep("one", 10), rep("two", 9), rep("three", 11)),
slot = c(1:10, 1:9, 1:11),
x = sample(100, 30))
RI_fun <- function(x, y) {
((x - y)/ y) * 100
}
RI <- rep(NA, 30)
for (i in seq_along(df$group)) {
RI[i] = RI_fun(x = df$x[i],
y = (df$x[df$slot == 1 & df$group == df$group[i]]))
}
RI
Upvotes: 0