gh0strider18
gh0strider18

Reputation: 1140

For i in loops in R

I have been really struggling to grasp a basic programming concept - the for loop. I typically deal with heirarchically structured data such that measurements repeat with levels of unique identifiers, like this:

ID  Measure
1   2
1   3
1   3
2   4
2   1
...

Very often I need to create a new column the aggregates within ID or produces a value for each row for each level of ID. The former I use pretty basic functions from either base or dplyr, but for the latter case I'd like to get in the habit of creating for loops.

So for this example, I would like a column added to my hypothetical df such that the new column begins with one for each ID and adds 1 to each subsequent row, until a new ID occurs.

So, this:

ID  Measure NewVal
1   2       1
1   3       2
1   3       3
2   4       1
2   1       2
...

Would love to learn for computing, but if there are other ways, would like to hear those too.

Upvotes: 0

Views: 152

Answers (5)

akrun
akrun

Reputation: 887088

Or you could use ave. The advantage is that it will give the sequence in the same order as that in the original dataset, which may be beneficial in unordered datasets.

transform(df, NewVal=ave(ID, ID, FUN=seq_along))
#  ID Measure NewVal
#1  1       2      1
#2  1       3      2
#3  1       3      3
#4  2       4      1
#5  2       1      2

For a more general case (if the ID column is factor )

transform(df, NewVal=ave(seq_along(ID), ID, FUN=seq_along))

Or if the ID column is ordered

df$NewVal <- sequence(tabulate(df$ID))

Or using dplyr

library(dplyr)
 df %>% 
    group_by(ID) %>% 
    mutate(NewVal=row_number())
data
df <- structure(list(ID = c(1L, 1L, 1L, 2L, 2L), Measure = c(2L, 3L, 
3L, 4L, 1L)), .Names = c("ID", "Measure"), class = "data.frame", 
row.names = c(NA, -5L))

Upvotes: 2

KFB
KFB

Reputation: 3501

You could also use data.table to assign the sequence by reference.

# library(data.table)
setDT(mydf)  ## convert to data table
mydf[,NewVal := seq(.N), by=ID]  ## .N contains number of rows in each ID group

#    ID Measure NewVal
# 1:  1       2      1
# 2:  1       3      2
# 3:  1       3      3
# 4:  2       4      1
# 5:  2       1      2

setDF(mydf)  ## convert easily to data frame if you wish.

Upvotes: 2

Matthew Lundberg
Matthew Lundberg

Reputation: 42639

seq_along gives an increasing sequence starting at 1, with the same length as its input. tapply is used to apply a function to various levels of input. Here we don't care what is supplied, so you can apply the ID column to itself:

> d$NewVal <- unlist(tapply(d$ID, d$ID, FUN=seq_along))
> d
  ID Measure NewVal
1  1       2      1
2  1       3      2
3  1       3      3
4  2       4      1
5  2       1      2

Upvotes: 2

alexwhitworth
alexwhitworth

Reputation: 4907

I'd recommend you don't use a for loop for this. It's not a good place for one. You can do this pretty easily inplyr (or dplyr) if you prefer:

require(plyr)
x <- data.frame(cbind(rnorm(100), rnorm(100)))
x$ID <- sample(1:10, 100, replace=T)

new_col <- function(x) {
  x <- x[order(x[,1]), ]
  x$NewVal <- 1:nrow(x)
  return(x)
}

x <- ddply(.data= x, .var= "ID", .fun= new_col)

Upvotes: 1

jazzurro
jazzurro

Reputation: 23574

One way is to use the splitstackshape package. There is a function called getanID. This is your friend here. If your df is called mydf, you would do the following. Please note that the outcome is data.table. If necessary, you want to convert that to data.frame.

library(splitstackshape)
getanID(mydf, "ID")

#   ID Measure .id
#1:  1       2   1
#2:  1       3   2
#3:  1       3   3
#4:  2       4   1
#5:  2       1   2

DATA

mydf <- structure(list(ID = c(1L, 1L, 1L, 2L, 2L), Measure = c(2L, 3L, 
3L, 4L, 1L)), .Names = c("ID", "Measure"), class = "data.frame", row.names = c(NA, 
-5L))

Upvotes: 3

Related Questions