Reputation: 1773
I've got an existing data.frame that contains some initial values. What I want to do is create another data.frame that has 10 randomly sampled rows for every row in the first data.frame. Also I'm trying to do this in an R fashion so I'd like to avoid iteration.
So far I've managed to apply a function to every row in the table that generates one value, however I'm not sure how to extend this to generating 10 rows per application and then rbind-ing the results back together.
Here's my progress so far:
Sample data:
starts <- structure(list(instance = structure(21:26, .Label = c("big_1",
"big_10", "big_11", "big_12", "big_13", "big_14", "big_15", "big_16",
"big_17", "big_18", "big_19", "big_2", "big_20", "big_3", "big_4",
"big_5", "big_6", "big_7", "big_8", "big_9", "competition01",
"competition02", "competition03", "competition04", "competition05",
"competition06", "competition07", "competition08", "competition09",
"competition10", "competition11", "competition12", "competition13",
"competition14", "competition15", "competition16", "competition17",
"competition18", "competition19", "competition20", "med_1", "med_10",
"med_11", "med_12", "med_13", "med_14", "med_15", "med_16", "med_17",
"med_18", "med_19", "med_2", "med_20", "med_3", "med_4", "med_5",
"med_6", "med_7", "med_8", "med_9", "small_1", "small_10", "small_11",
"small_12", "small_13", "small_14", "small_15", "small_16", "small_17",
"small_18", "small_19", "small_2", "small_20", "small_3", "small_4",
"small_5", "small_6", "small_7", "small_8", "small_9"), class = "factor"),
event.clashes = c(674L, 626L, 604L, 1036L, 991L, 929L), overlaps = c(0L,
0L, 0L, 0L, 0L, 0L), room.valid = c(324L, 320L, 268L, 299L,
294L, 220L), final.timeslot = c(0L, 0L, 0L, 0L, 0L, 0L),
three.in.a.row = c(246L, 253L, 259L, 389L, 365L, 430L), single.event = c(97L,
120L, 97L, 191L, 150L, 138L)), .Names = c("instance", "event.clashes",
"overlaps", "room.valid", "final.timeslot", "three.in.a.row",
"single.event"), row.names = c(NA, 6L), class = "data.frame")
Code:
library(reshape)
m.starts <- melt(starts)
df <- data.frame()
gen.data <- function(x){
inst <- x[1]
constr <- x[2]
v <- as.integer(x[3])
val <- as.integer(rnorm(1, max(0, v), v / 2))
# Should probably return a data.frame here
print(paste(inst, constr, val))
}
apply(m.starts, 1, gen.data)
Upvotes: 2
Views: 14741
Reputation: 173547
You can combine the ideas of Andrie and Chase's solutions as follows:
#Repeat each row ten times
start.m1 <- start.m[rep(1:nrow(start.m),each = 10),]
#Create extended vector to use to define
# means/sd
m <- rep(start.m$value,each = 10)
#Remove negative values;
# although none were in your data
m[m <= 0] <- 0
#Replace value with rnorm values
start.m1$value <- rnorm(nrow(start.m1), mean = m, sd = m / 2)
which yields something that looks like this:
> head(start.m1)
instance variable value
1 competition01 event.clashes 1098.0220
1.1 competition01 event.clashes 1208.4304
1.2 competition01 event.clashes 883.7976
1.3 competition01 event.clashes 365.1396
1.4 competition01 event.clashes 862.3113
1.5 competition01 event.clashes 1352.7085
I'm using Andrie's suggestion to use subset indexing to extend the data frame, and then Chase's interpretation of your question, wherein you seem to want the values to actually be generated via rnorm
, rather than resampling the original rows themselves. The key here is that rnorm
is vectorized.
Upvotes: 0
Reputation: 179418
There is no need for apply
or rbind
. A simple vector subsetting is all that is required:
samples <- sample(1:nrow(starts), nrow(starts)*10, replace=TRUE)
starts[samples, 1:3]
The first 5 rows of results:
> head(starts[samples, 1:3], 5)
instance event.clashes overlaps
2 competition02 626 0
5 competition05 991 0
6 competition06 929 0
4 competition04 1036 0
2.1 competition02 626 0
Upvotes: 1
Reputation: 69171
It's unclear to me what you're really doing, but the following changes to your gen_data function seem to do what you want. Specifically, it's not clear to me what you are doing with val
as this seemingly just generates a random number with a mean of the value column for that row and a standard deviation of that value divided by two. Is that what you want? I added a new parameter to your function to account for the number of rows you want to generate as well:
gen.data <- function(x, nreps = 10){
inst <- x[1]
constr <- x[2]
v <- as.integer(x[3])
val <- as.integer(rnorm(nreps, max(0, v), v / 2))
out <- data.frame(inst = rep(inst, nreps)
, constr = rep(constr, nreps)
, val = val)
return(out)
}
And then in use:
do.call("rbind", apply(m.starts, 1, gen.data))
Results in:
inst constr val
1 competition01 event.clashes 876
2 competition01 event.clashes 714
3 competition01 event.clashes 912
4 competition01 event.clashes -46
5 competition01 event.clashes 369
....
....
357 competition06 single.event 149
358 competition06 single.event 248
359 competition06 single.event 128
360 competition06 single.event 168
Upvotes: 9