Reputation: 179

For Loop Function in R

I have been struggling to figure out why I am not returning the correct values to my data frame from my function. I want to loop through a vector of my data frame and create a new column by a calculation within the vector's elements. Here's what I have:

# x will be the data frame's vector
y <- function(x){
 new <- c()
 for (i in x){
  new <- c(new, x[i] - x[i+1])
 }
 return (new)
}

So here I want to create a new vector that returns the next element subtracted from current element. Now, when I apply it to my data frame

df$new <- lapply(df$I, y)

I get all NAs. I know I'm missing something completely obvious...

Also, how would I execute the function that resets itself if df$ID changes so I am not subtracting elements from two different df$IDs? For example, my data frame will have

ID  I   Order   new
1001    5   1   1
1001    6   2   -2
1001    4   3   -2
1001    2   4   NA
1005    2   1   6
1005    8   2   0
1005    8   3   -2
1005    6   4   NA

Thanks!

Upvotes: 1

Answers (3)

beroe

Reputation: 12316

Rather than a loop, you would be better off using a vector version of the math. The exact indices will depend on what you want to do with the last value... (Note this line is not placed into your for loop, but just gives the result.)

df$new = c(df$I[-1],NA) - df$I

Here you will be subtracting the original df$I from a shifted version that omits the first value [-1] and appends a NA at the end.

EDIT per comments: If you don't want to subtract across df$ID, you can blank out that subset of cells after subtraction:

 df$new[df$ID != c(df$ID[-1],NA)] = NA

Upvotes: 1

Gregor Thomas

Reputation: 145765

The dplyr library makes it very easy to do things separately for each level of a grouping variable, in your case ID. We can use diff as @Richard Scriven recommends, and use dplyr::mutate to add a new column.

> library(dplyr)
> df %>% group_by(ID) %>% mutate(new2 = c(diff(I), NA))
Source: local data frame [8 x 5]
Groups: ID

    ID I Order new new2
1 1001 5     1   1    1
2 1001 6     2  -2   -2
3 1001 4     3  -2   -2
4 1001 2     4  NA   NA
5 1005 2     1   6    6
6 1005 8     2   0    0
7 1005 8     3  -2   -2
8 1005 6     4  NA   NA

Upvotes: 1

Rich Scriven

Reputation: 99331

Avoid the loop and use diff. Everything is vectorized here so it's easy.

df$new <- c(diff(df$I), NA)

But I don't understand your example result. Why are some 0 values changed to NA and some are not? And shouldn't 8-2 be 6 and not -6? I think that needs to be clarified.

If the 0 values need to be changed to NA, just do the following after the above code.

df$new[df$new == 0] <- NA

A one-liner of the complete process, that returns the new data frame, can be

within(df, { new <- c(diff(I), NA); new[new == 0] <- NA })

Update : With respect to your comments below, my updated answer follows.

> M <- do.call(rbind, Map(function(x) { x$z <- c(diff(x$I), NA); x }, 
                          split(dat, dat$ID)))
> rownames(M) <- NULL
> M
    ID I Order  z
1 1001 5     1  1
2 1001 6     2 -2
3 1001 4     3 -2
4 1001 2     4 NA
5 1005 2     1  6
6 1005 8     2  0
7 1005 8     3 -2
8 1005 6     4 NA

Upvotes: 2

For Loop Function in R

Answers (3)

Related Questions