Reputation: 151
I have a data frame which looks like this:
inten new.probes
12.28280 AFFX-r2-P1-cre-5_at
12.35039 AFFX-r2-P1-cre-5_at
12.38397 AFFX-r2-P1-cre-5_at
12.36304 AFFX-r2-P1-cre-5_at
12.16271 AFFX-r2-P1-cre-5_at
12.70304 AFFX-r2-P1-cre-3_at
12.28280 AFFX-r2-P1-cre-3_at
12.35039 AFFX-r2-P1-cre-3_at
12.38397 AFFX-r2-P1-cre-3_at
12.36304 AFFX-r2-P1-cre-3_at
12.16271 AFFX-r2-P1-cre-2_at
12.70304 AFFX-r2-P1-cre-2_at
12.16271 AFFX-r2-P1-cre-2_at
12.70304 AFFX-r2-P1-cre-2_at
(The above is in the form as two separate columns with the probenames as one column and signal intensity values as other) And I want the data frame to be converted the following way:
AFFX-r2-P1-cre-5_at 12.28280 12.35039 12.38397 12.36304 12.16271
AFFX-r2-P1-cre-3_at 12.28280 12.35039 12.38397 12.36304 12.16271
AFFX-r2-P1-cre-2_at 12.38304 12.36304 12.38397 12.16271 12.70304
Any suggestions are welcome. It is a large datset and I have only given a fraction of it for help.
Upvotes: 1
Views: 139
Reputation: 193517
This is how I would approach this problem:
new.probes
variable is sorted.sequence()
and rle()
to generate a "time" variable for each new.probes
.reshape()
to transform the data.Here's a worked example with your sample data (assuming it's named "DF").
DF = DF[order(DF$new.probes), ]
DF$time = sequence(rle(as.vector(DF$new.probes))$lengths)
reshape(DF, direction = "wide", idvar = "new.probes", timevar = "time")
# new.probes inten.1 inten.2 inten.3 inten.4 inten.5
# 11 AFFX-r2-P1-cre-2_at 12.16271 12.70304 12.16271 12.70304 NA
# 6 AFFX-r2-P1-cre-3_at 12.70304 12.28280 12.35039 12.38397 12.36304
# 1 AFFX-r2-P1-cre-5_at 12.28280 12.35039 12.38397 12.36304 12.16271
Or, if you prefer the syntax in reshape2
to base R's reshape
, replace step 3 with:
require(reshape2)
dcast(DF, new.probes ~ time, value.var = "inten")
Upvotes: 1
Reputation: 179418
If you had the same number of elements for value of new.probes
, you could have used:
do.call(rbind, unstack(dat))
[,1] [,2] [,3] [,4] [,5]
AFFX-r2-P1-cre-2_at 12.16271 12.70304 12.16271 12.70304 12.16271
AFFX-r2-P1-cre-3_at 12.70304 12.28280 12.35039 12.38397 12.36304
AFFX-r2-P1-cre-5_at 12.28280 12.35039 12.38397 12.36304 12.16271
Warning message:
In function (..., deparse.level = 1) :
number of columns of result is not a multiple of vector length (arg 1)
But this is clearly wrong - you need to pad the shorter vectors with NA
:
x <- unstack(dat)
m <- max(sapply(x, length))
do.call(rbind, lapply(x, function(x)c(x, rep(NA, m-length(x)))))
[,1] [,2] [,3] [,4] [,5]
AFFX-r2-P1-cre-2_at 12.16271 12.70304 12.16271 12.70304 NA
AFFX-r2-P1-cre-3_at 12.70304 12.28280 12.35039 12.38397 12.36304
AFFX-r2-P1-cre-5_at 12.28280 12.35039 12.38397 12.36304 12.16271
Upvotes: 3