Pablo Báez
Pablo Báez

Reputation: 37

How to write a loop or function to get a matrix with repetitions of values from a data frame?

I'm trying to get a data frame from another, performing repetitions of certain values (a, b, c and d in my example) a certain number of times (whose values appear in each cell of my first data frame). To illustrate this better, I show the data:

df<-data.frame(replicate(4,sample(20:50,10,rep=TRUE)))
a<-0
b<-1
c<-2
d<-9

I tried first:

for (i in 1:10)
{
print(rep(a, df[i,1]))
}

But when I tried to save the output, it gives me only the first row analysis:

for (i in 1:10)
{
output<-print(rep(a, df[i,1]))
}

Then I tried with something more complex like:

myfunc<-function(n){
  a<-0
  b<-1
  c<-2
  d<-9
  IDs<- matrix(n[,1]) #A new column with the IDs for each row(rownames)
  w = NULL
  x = NULL
  y = NULL
  z = NULL
  for (i in 1:nrow(n)) {
    w<-rbind(t(as.matrix(rep(a, n[i,1]))))
    x<-rbind(t(as.matrix(rep(b, n[i,2]))))
    y<-rbind(t(as.matrix(rep(c, n[i,3]))))
    z<-rbind(t(as.matrix(rep(d, n[i,4]))))
  }
  output<-cbind(IDs, w, x, y, z)
  return(output <- as.data.frame(output))
}

But I do not get what I need.

For a matrix like this:

Example matrix

The expected output will be:

first row: 21 times 0, 46 times 1, 25 times 2 and 28 times 9. All in 120 columns... and so on with the other rows

I really appreciate if you can help me to solve this issue.

Upvotes: 1

Views: 142

Answers (2)

AkselA
AkselA

Reputation: 8836

I take it that what you expect from the first row of the matrix is

r1 <- rep(c(0, 1, 2, 9), times=c(21, 46, 25, 28))

and from the second row is

r2 <- rep(c(0, 1, 2, 9), times=c(47, 46, 45, 46))

?

If so, then you have a problem with unequal lengths if you want to fit this into a data frame.

length(r1)
# [1] 120

length(r2)
# [1] 184

Data frames can't deal with that, but lists will

l <- list(r1, r2)

To do this for all the rows in your matrix you could do something like

mat <- matrix(c(21, 46, 25, 28,
                47, 46, 45, 46,
                35, 24, 46, 42,
                27, 22, 36, 50), 4, byrow=TRUE)

l <- list()

for (row in 1:4) {
    l[[row]] <- rep(c(0, 1, 2, 9), times=c(mat[row, 1], mat[row, 2], 
                                           mat[row, 3], mat[row, 4]))
}

sapply(l, length)
# [1] 120 184 147 135

I don't know where 0 1 2 9came from, if it varies you'll have to factor that in as well. If there is a larger, or varying, amount of columns in the actual matrix, you'd probably be better off with a nested loop or some lapply magic as suggested by Daniel.

If you really want a matrix/data frame type structure you can get it by padding with NAs, for example like this

mat.new <- t(sapply(l, '[', seq(max(sapply(l, length)))))

Upvotes: 0

Daniel Anderson
Daniel Anderson

Reputation: 2424

If I'm understanding correctly, moving from a for loop to lapply should get you what you want.

 lapply(1:10, function(i) rep(a, df[i, 1]))

You can then generalize that for all columns by

l <- list(a = 0, b = 1, c = 2, d = 9)
lapply(seq_along(l), function(i) lapply(1:10, function(j) rep(l[[i]], df[j, i])))

Which gives you a nested list and (I think) your desired output.

Edit

Now that I understand better what you want I think I can help better. But it seems to me that you have an issue here in that you're wanting a matrix but, at least in the example you've provided, each row of the matrix would be of a different length. Rather than padding these with NA, I just created a fifth column that evened things out. See if the below gets at what you're wanting.

df$X5 <- (max(rowSums(df)) + 5) - rowSums(df)

l <- list(a = 0, b = 1, c = 2, d = 9, e = 5)

tmp <- lapply(seq_along(l), function(i) {
    lapply(1:nrow(df), function(j) rep(l[[i]], df[j, i]))
})

max_col <- max(rowSums(df))

m <- matrix(rep(NA, length(l)*max_col), ncol = max_col)

for(i in seq_along(l)) {
    m[i, ] <- unlist(lapply(tmp, "[[", i))
}

Upvotes: 1

Related Questions