brendbech
brendbech

Reputation: 419

Using one specific value per group to every other value in group

I have a specific programming question concerning R. I want to apply a custom function over a whole data set, but the values in the function should change compared to what group it belongs to. Here is a dataset that is similar to the one i'm working with

set.seed(123)
df <- data.frame(group = c(rep("one", 10), rep("two", 9), rep("three", 11)),
         slot = c(1:10, 1:9, 1:11),
         x = sample(100, 30))

And the function

RI_fun <- function(x, y) {
((x - y)/ y) * 100
}

The real dataset is larger but the structure is the same. A little info on the real dataset: It is a series of measurements(slots) on a sample (group) where i want the first measurement (slot == 1) to be y in the custom function (RI_fun)

i want to make a new column that is the output of the custom function where x = df$x and y is the x value where df$slot == 1 to each group.

I have tried to make a for loop, but without success. My idea was to make the y value an if else statement where it checked for df$group and applied df$x where slot == 1 and group == group that has just been checked.

Here is my attemp:

for (i in seq_along(df$group)) {
RI[i] = RI_fun(x = df$x[i],
               y = (ifelse(df$group == df$group[i],
                           df$x[df$slot == 1 & df$group == df$group[i]],
                           NA)))

However the output is:

[1]   0.00000 172.41379  41.37931 196.55172 213.79310 -82.75862  72.41379 186.20690  75.86207  44.82759        NA
[12]        NA        NA        NA        NA        NA        NA        NA        NA        NA        NA        NA
[23]        NA        NA        NA        NA        NA        NA        NA        NA

When i manually checked what the output should be, it showed that the for-loop is correct up to [11] where it doesn't work anymore. I've tried some other for-loops that are similar to this one, but this is the one where i got closest to the desired output.

Any help you got would be appreciated. If i wasn't clear enough, please ask and i'll try to make it more clear.

Upvotes: 1

Views: 429

Answers (2)

MrGumble
MrGumble

Reputation: 5776

Great question, and nicely formatted with reproducible example! Kudos!

In R, you generally don't need to bother using loops. R is inherently vectorized, so we can express ourselves in terms of vectors. Moving along to data.frames, the idea is the same, and adding the package dplyr, we get some easy functionality.

First, I demonstrate what you want:

library(dplyr)
df %>% group_by(group) %>%
  mutate(y=x[slot==1])
as.data.frame(.Last.value)
   group slot  x  y
1    one    1 30 30
2    one    2 72 30
3    one    3 88 30
4    one    4  5 30
5    one    5 55 30
6    one    6 42 30
7    one    7 11 30
8    one    8 53 30
9    one    9 73 30
10   one   10 87 30
11   two    1 52 52
12   two    2 82 52
13   two    3 78 52
14   two    4 59 52
15   two    5 12 52
16   two    6 95 52
17   two    7  1 52
18   two    8 70 52
19   two    9 66 52
20 three    1 69 69
21 three    2 79 69
22 three    3 80 69
23 three    4 21 69
24 three    5 94 69
25 three    6 75 69
26 three    7 25 69
27 three    8 15 69
28 three    9 74 69
29 three   10 31 69
30 three   11 43 69

So, we can confirm that we get the correct x and y values. Try to remove the line group_by and see what happens.

Satisfied that we are obtaining the relevant x and y values, plug in your function:

df %>% group_by(group) %>%
  mutate(RI=RI_fun(x, x[slot==1]))

If you did try to remove the group_by line, you got an error. That's because mutate wants to use a value, either 1 for the entire vector (column) or a value per element in the column. So what happens if you have multiple slots == 1 per group? Well, you'll have to decide how to deal with the deviation from your requirements.

EDIT:

The reason your for-loop isn't working as expected is due to the ifelse at y. Simply replace with

for (i in seq_along(df$group)) {
RI[i] = RI_fun(x = df$x[i],
               y = df$x[df$slot == 1 & df$group == df$group[i]])
}

and it should work just fine.

This is due to ifelse is vectorized; for each element in the test (first) argument (df$group == df$group[i]) it returns the corresponding element in either the yes (second) or no (third) element.

Upvotes: 1

Joseph Clark McIntyre
Joseph Clark McIntyre

Reputation: 1094

The problem is with the ifelse statement. When you call ifelse(df$group == df$group[i] ...), it returns false as soon as df$group[i] != df$group[1]; it returns the result of the very first comparison. You don't need the ifelse, as far as I can see. The following code worked for me (though you should do the manual check to make sure that it's correct).

df <- data.frame(group = c(rep("one", 10), rep("two", 9), rep("three", 11)),
                 slot = c(1:10, 1:9, 1:11),
                 x = sample(100, 30))

RI_fun <- function(x, y) {
  ((x - y)/ y) * 100
}

RI <- rep(NA, 30)

for (i in seq_along(df$group)) {
  RI[i] = RI_fun(x = df$x[i],
                 y = (df$x[df$slot == 1 & df$group == df$group[i]]))
}

RI

Upvotes: 0

Related Questions