Stacey John
Stacey John

Reputation: 151

How to stack the data in R in the following format?

I have a data frame which looks like this:

  inten      new.probes
  12.28280      AFFX-r2-P1-cre-5_at
  12.35039      AFFX-r2-P1-cre-5_at
  12.38397      AFFX-r2-P1-cre-5_at
  12.36304      AFFX-r2-P1-cre-5_at
  12.16271      AFFX-r2-P1-cre-5_at
  12.70304      AFFX-r2-P1-cre-3_at
  12.28280      AFFX-r2-P1-cre-3_at
  12.35039      AFFX-r2-P1-cre-3_at
  12.38397      AFFX-r2-P1-cre-3_at
  12.36304      AFFX-r2-P1-cre-3_at
  12.16271      AFFX-r2-P1-cre-2_at
  12.70304      AFFX-r2-P1-cre-2_at 
  12.16271      AFFX-r2-P1-cre-2_at
  12.70304      AFFX-r2-P1-cre-2_at

(The above is in the form as two separate columns with the probenames as one column and signal intensity values as other) And I want the data frame to be converted the following way:

AFFX-r2-P1-cre-5_at 12.28280 12.35039  12.38397  12.36304   12.16271 
AFFX-r2-P1-cre-3_at 12.28280 12.35039  12.38397  12.36304   12.16271 
AFFX-r2-P1-cre-2_at 12.38304 12.36304  12.38397  12.16271   12.70304

Any suggestions are welcome. It is a large datset and I have only given a fraction of it for help.

Upvotes: 1

Views: 139

Answers (2)

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193517

This is how I would approach this problem:

  1. Make sure that the new.probes variable is sorted.
  2. Use sequence() and rle() to generate a "time" variable for each new.probes.
  3. Use reshape() to transform the data.

Here's a worked example with your sample data (assuming it's named "DF").

DF = DF[order(DF$new.probes), ]
DF$time = sequence(rle(as.vector(DF$new.probes))$lengths)
reshape(DF, direction = "wide", idvar = "new.probes", timevar = "time")
#             new.probes  inten.1  inten.2  inten.3  inten.4  inten.5
# 11 AFFX-r2-P1-cre-2_at 12.16271 12.70304 12.16271 12.70304       NA
# 6  AFFX-r2-P1-cre-3_at 12.70304 12.28280 12.35039 12.38397 12.36304
# 1  AFFX-r2-P1-cre-5_at 12.28280 12.35039 12.38397 12.36304 12.16271

Or, if you prefer the syntax in reshape2 to base R's reshape, replace step 3 with:

require(reshape2)
dcast(DF, new.probes ~ time, value.var = "inten")

Upvotes: 1

Andrie
Andrie

Reputation: 179418

If you had the same number of elements for value of new.probes, you could have used:

do.call(rbind, unstack(dat))
                        [,1]     [,2]     [,3]     [,4]     [,5]
AFFX-r2-P1-cre-2_at 12.16271 12.70304 12.16271 12.70304 12.16271
AFFX-r2-P1-cre-3_at 12.70304 12.28280 12.35039 12.38397 12.36304
AFFX-r2-P1-cre-5_at 12.28280 12.35039 12.38397 12.36304 12.16271
Warning message:
In function (..., deparse.level = 1)  :
  number of columns of result is not a multiple of vector length (arg 1)

But this is clearly wrong - you need to pad the shorter vectors with NA:

x <- unstack(dat)
m <- max(sapply(x, length))
do.call(rbind, lapply(x, function(x)c(x, rep(NA, m-length(x)))))

                        [,1]     [,2]     [,3]     [,4]     [,5]
AFFX-r2-P1-cre-2_at 12.16271 12.70304 12.16271 12.70304       NA
AFFX-r2-P1-cre-3_at 12.70304 12.28280 12.35039 12.38397 12.36304
AFFX-r2-P1-cre-5_at 12.28280 12.35039 12.38397 12.36304 12.16271

Upvotes: 3

Related Questions