Reputation: 31
My data is set up as so:
site date amb ppm1 ppm2 ppm3 time0 time1 time2 time3
A 5/6/12 350 370 380 385 0 3 6 9
I need it in a format with 2 columns (one being concentration and the other time)
conc time
350 0
370 3
380 6
385 9
So that I can run a regression on it. Or help with how to run a regression on the original set up would be great.
Upvotes: 3
Views: 199
Reputation: 208
If your data is a single vector:
> mydata <- c("A", "5/6/12", 350, 370, 380, 385, 0, 3, 6, 9)
your names added:
> names(mydata) <- c("site", "date", "amb" ,"ppm1","ppm2","ppm3","time0","time1","time2","time3")
would kinda look like you describe above:
> mydata
site date amb ppm1 ppm2 ppm3 time0 time1 time2 time3
"A" "5/6/12" "350" "370" "380" "385" "0" "3" "6" "9"
and to transform it you could just do:
> data.frame(conc=mydata[3:6],time=mydata[7:10])
which would result in
conc time
amb 350 0
ppm1 370 3
ppm2 380 6
ppm3 385 9
Upvotes: 2
Reputation: 193527
Using your sample data, and assuming your data.frame
is called "mydf", you can use stack
for each "set" of columns to get the output you show:
setNames(data.frame(stack(mydf[, grep("^ppm|^amb", names(mydf))])[-2],
stack(mydf[, grep("^time", names(mydf))])[-2]),
c("conc", "time"))
# conc time
# 1 350 0
# 2 370 3
# 3 380 6
# 4 385 9
grep
was used, just as an example for if you have many similarly named columns and don't want to count to identify their column indexes. If this is truly representative of your data, stack
could also just be stack(mydf[, 3:6])[-2]
and stack(mydf[, 7:10])
.setNames
is just a convenience function to rename the column names in the output. [-2]
just removes the second column from each stack
command (which is a column of the column names from which the values were taken).Another option, if you don't mind changing the variable name of "abm" to "ppm0" would be to use reshape
:
names(mydf)[3] <- "ppm0"
reshape(mydf, direction = "long", idvar = 1:2,
timevar = "measure", varying = 3:ncol(mydf), sep = "")
# site date measure ppm time
# A.5/6/12.0 A 5/6/12 0 350 0
# A.5/6/12.1 A 5/6/12 1 370 3
# A.5/6/12.2 A 5/6/12 2 380 6
# A.5/6/12.3 A 5/6/12 3 385 9
You can, of course, drop the first three columns very easily.
Upvotes: 3
Reputation: 336
You should use regular expressions to split the string to get your two vectors (concentration and time). And if you are using R, you create a data frame simply by calling
data.frame(concentration=concentration,time=time)
on your two vectors.
Upvotes: 1