user1612622
user1612622

Reputation: 31

creating a dataframe

My data is set up as so:

site    date      amb   ppm1   ppm2   ppm3   time0   time1   time2   time3
A       5/6/12     350   370    380   385     0       3        6       9

I need it in a format with 2 columns (one being concentration and the other time)

conc   time
350      0
370      3
380      6
385      9

So that I can run a regression on it. Or help with how to run a regression on the original set up would be great.

Upvotes: 3

Views: 199

Answers (3)

user2276428
user2276428

Reputation: 208

If your data is a single vector:

> mydata <- c("A", "5/6/12", 350, 370, 380, 385, 0, 3, 6, 9)

your names added:

> names(mydata) <- c("site", "date", "amb" ,"ppm1","ppm2","ppm3","time0","time1","time2","time3")

would kinda look like you describe above:

> mydata
  site     date      amb     ppm1     ppm2     ppm3    time0    time1    time2    time3 
  "A"  "5/6/12"    "350"    "370"    "380"    "385"      "0"      "3"      "6"      "9"

and to transform it you could just do:

> data.frame(conc=mydata[3:6],time=mydata[7:10])

which would result in

       conc time
  amb   350    0
  ppm1  370    3
  ppm2  380    6
  ppm3  385    9

Upvotes: 2

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193527

Using your sample data, and assuming your data.frame is called "mydf", you can use stack for each "set" of columns to get the output you show:

setNames(data.frame(stack(mydf[, grep("^ppm|^amb", names(mydf))])[-2], 
                    stack(mydf[, grep("^time", names(mydf))])[-2]), 
         c("conc", "time"))
#   conc time
# 1  350    0
# 2  370    3
# 3  380    6
# 4  385    9
  • grep was used, just as an example for if you have many similarly named columns and don't want to count to identify their column indexes. If this is truly representative of your data, stack could also just be stack(mydf[, 3:6])[-2] and stack(mydf[, 7:10]).
  • setNames is just a convenience function to rename the column names in the output.
  • The [-2] just removes the second column from each stack command (which is a column of the column names from which the values were taken).

Another option, if you don't mind changing the variable name of "abm" to "ppm0" would be to use reshape:

names(mydf)[3] <- "ppm0"
reshape(mydf, direction = "long", idvar = 1:2, 
        timevar = "measure", varying = 3:ncol(mydf), sep = "")
#            site   date measure ppm time
# A.5/6/12.0    A 5/6/12       0 350    0
# A.5/6/12.1    A 5/6/12       1 370    3
# A.5/6/12.2    A 5/6/12       2 380    6
# A.5/6/12.3    A 5/6/12       3 385    9

You can, of course, drop the first three columns very easily.

Upvotes: 3

BobbyZ
BobbyZ

Reputation: 336

You should use regular expressions to split the string to get your two vectors (concentration and time). And if you are using R, you create a data frame simply by calling

data.frame(concentration=concentration,time=time) 

on your two vectors.

Upvotes: 1

Related Questions