Reputation: 1140
I have been really struggling to grasp a basic programming concept - the for
loop. I typically deal with heirarchically structured data such that measurements repeat with levels of unique identifiers, like this:
ID Measure
1 2
1 3
1 3
2 4
2 1
...
Very often I need to create a new column the aggregates within ID
or produces a value for each row for each level of ID
. The former I use pretty basic functions from either base
or dplyr
, but for the latter case I'd like to get in the habit of creating for
loops.
So for this example, I would like a column added to my hypothetical df
such that the new column begins with one for each ID
and adds 1
to each subsequent row, until a new ID
occurs.
So, this:
ID Measure NewVal
1 2 1
1 3 2
1 3 3
2 4 1
2 1 2
...
Would love to learn for
computing, but if there are other ways, would like to hear those too.
Upvotes: 0
Views: 152
Reputation: 887088
Or you could use ave
. The advantage is that it will give the sequence
in the same order as that in the original dataset, which may be beneficial in unordered datasets.
transform(df, NewVal=ave(ID, ID, FUN=seq_along))
# ID Measure NewVal
#1 1 2 1
#2 1 3 2
#3 1 3 3
#4 2 4 1
#5 2 1 2
For a more general case (if the ID
column is factor
)
transform(df, NewVal=ave(seq_along(ID), ID, FUN=seq_along))
Or if the ID
column is ordered
df$NewVal <- sequence(tabulate(df$ID))
Or using dplyr
library(dplyr)
df %>%
group_by(ID) %>%
mutate(NewVal=row_number())
data
df <- structure(list(ID = c(1L, 1L, 1L, 2L, 2L), Measure = c(2L, 3L,
3L, 4L, 1L)), .Names = c("ID", "Measure"), class = "data.frame",
row.names = c(NA, -5L))
Upvotes: 2
Reputation: 3501
You could also use data.table
to assign the sequence by reference.
# library(data.table)
setDT(mydf) ## convert to data table
mydf[,NewVal := seq(.N), by=ID] ## .N contains number of rows in each ID group
# ID Measure NewVal
# 1: 1 2 1
# 2: 1 3 2
# 3: 1 3 3
# 4: 2 4 1
# 5: 2 1 2
setDF(mydf) ## convert easily to data frame if you wish.
Upvotes: 2
Reputation: 42639
seq_along
gives an increasing sequence starting at 1, with the same length as its input. tapply
is used to apply a function to various levels of input. Here we don't care what is supplied, so you can apply the ID
column to itself:
> d$NewVal <- unlist(tapply(d$ID, d$ID, FUN=seq_along))
> d
ID Measure NewVal
1 1 2 1
2 1 3 2
3 1 3 3
4 2 4 1
5 2 1 2
Upvotes: 2
Reputation: 4907
I'd recommend you don't use a for loop for this. It's not a good place for one. You can do this pretty easily inplyr
(or dplyr
) if you prefer:
require(plyr)
x <- data.frame(cbind(rnorm(100), rnorm(100)))
x$ID <- sample(1:10, 100, replace=T)
new_col <- function(x) {
x <- x[order(x[,1]), ]
x$NewVal <- 1:nrow(x)
return(x)
}
x <- ddply(.data= x, .var= "ID", .fun= new_col)
Upvotes: 1
Reputation: 23574
One way is to use the splitstackshape
package. There is a function called getanID
. This is your friend here. If your df is called mydf
, you would do the following. Please note that the outcome is data.table. If necessary, you want to convert that to data.frame.
library(splitstackshape)
getanID(mydf, "ID")
# ID Measure .id
#1: 1 2 1
#2: 1 3 2
#3: 1 3 3
#4: 2 4 1
#5: 2 1 2
DATA
mydf <- structure(list(ID = c(1L, 1L, 1L, 2L, 2L), Measure = c(2L, 3L,
3L, 4L, 1L)), .Names = c("ID", "Measure"), class = "data.frame", row.names = c(NA,
-5L))
Upvotes: 3