add variable to a list in R

Question

I have 28 list within a list and I try to add another variable called ID to each individual list. I found this Dataframes in a list; adding a new variable with name of dataframe to be very helpful. But when i tried his code, it doesn't work in my case. I think it's because my list doesn't have clear labels [1],[2].[3], etc.. that the code can recognize.

all$id <- rep(names(mylist), sapply(mylist, nrow))


>List of 1
$ :List of 28

  ..$ :'data.frame':    271 obs. of  12 variables:


  .. ..$ Sample_ID                 : Factor w/ 271 levels "MC25",..: 19 27 2

  .. ..$ Reported_Analyte          : Factor w/ 10 levels "2-Butoxyethanol",..: 7 7 7 

 ..$ Date_Collected            : Factor w/ 71 levels "2010-05-08","2010-05-09",..: 8 9 1

  .. ..$ Result2                   : num [1:271] 0.11 0.11 0.11 0.11 

  ..$ :'data.frame':    6 obs. of  12 variables:


  .. ..$ Sample_ID                 : Factor w/ 271 levels "MC25",..: 19 27 2

  .. ..$ Reported_Analyte          : Factor w/ 10 levels "2-Butoxyethanol",..: 7 7 7 

 ..$ Date_Collected            : Factor w/ 71 levels "2010-05-08","2010-05-09",..: 8 9 1

  .. ..$ Result2                   : num [1:271] 0.11 0.11 0.11 0.11

Gavin Simpson · Accepted Answer

It really isn't very clear what you want to achieve (the post you linked to was about collapsing over the list of data frames and adding into the collapsed version an ID variable indicating which original data frame each row in the collapsed data frame came from).

I see a complication with your data; you have a list of 28 data frames within a list. You can see that in the output from str() that is given in your Q. You can see this better with this example data set (here all the data frames are the same but that is just for expedience)

set.seed(42)
dat <- data.frame(Sample_ID = factor(sample(10)),
                  Reported_Analyte = factor(sample(LETTERS, 10)),
                  Date_Collected = Sys.Date() + 0:9,
                  Result2 = rnorm(10))

mylist <- list(lapply(1:28, function(x) dat))

If we look at mylist using str() we see the nature of the complication I mentioned:

R> str(mylist, max = 2)
List of 1
 $ :List of 28
  ..$ Data_frame_ 1 :'data.frame':  10 obs. of  4 variables:
  ..$ Data_frame_ 2 :'data.frame':  10 obs. of  4 variables:
  ..$ Data_frame_ 3 :'data.frame':  10 obs. of  4 variables:
  ..$ Data_frame_ 4 :'data.frame':  10 obs. of  4 variables:
  ..$ Data_frame_ 5 :'data.frame':  10 obs. of  4 variables:
  ..$ Data_frame_ 6 :'data.frame':  10 obs. of  4 variables:
  ..$ Data_frame_ 7 :'data.frame':  10 obs. of  4 variables:
....

Where the post you linked to was starting from was the list inside your outer list and that list had named components. If you don't need the outer list, perhaps best to throw it away at this stage:

mylist2 <- mylist[[1]]
## the `[[` are important as we want the 1st component *inside* the list
## using `[` would just give us a list within a list again.

Names can then be added to this list

names(mylist2) <- paste("Data_frame_", seq_along(mylist2), sep = "")

which would result in

R> str(mylist2)
List of 28
 $ Data_frame_1 :'data.frame':  10 obs. of  4 variables:
  ..$ Sample_ID       : Factor w/ 10 levels "1","2","3","4",..: 10 9 3 6 4 8 5 1 2 7
  ..$ Reported_Analyte: Factor w/ 10 levels "C","F","I","J",..: 6 7 10 2 5 8 9 1 3 4
  ..$ Date_Collected  : Date[1:10], format: "2012-05-02" "2012-05-03" ...
  ..$ Result2         : num [1:10] 1.305 2.287 -1.389 -0.279 -0.133 ...
 $ Data_frame_2 :'data.frame':  10 obs. of  4 variables:
  ..$ Sample_ID       : Factor w/ 10 levels "1","2","3","4",..: 10 9 3 6 4 8 5 1 2 7
  ..$ Reported_Analyte: Factor w/ 10 levels "C","F","I","J",..: 6 7 10 2 5 8 9 1 3 4
  ..$ Date_Collected  : Date[1:10], format: "2012-05-02" "2012-05-03" ...
  ..$ Result2         : num [1:10] 1.305 2.287 -1.389 -0.279 -0.133 ...
....

Notice the List of 1 is no longer reported.

If the list of data frames within a list is important to you (not sure why it would be, but OK), then you can assign the names to the [[1]]st component directly.

names(mylist[[1]]) <- paste("Data_frame_", seq_along(mylist[[1]]), sep = "")

(Notice I'm using the original mylist and on both occasions I index that list with [[1]].)

The result is similar to the above though the list within a list structure is retained:

R> str(mylist)
List of 1
 $ :List of 28
  ..$ Data_frame_1 :'data.frame':   10 obs. of  4 variables:
  .. ..$ Sample_ID       : Factor w/ 10 levels "1","2","3","4",..: 10 9 3 6 4 8 5 1 2 7
  .. ..$ Reported_Analyte: Factor w/ 10 levels "C","F","I","J",..: 6 7 10 2 5 8 9 1 3 4
  .. ..$ Date_Collected  : Date[1:10], format: "2012-05-02" "2012-05-03" ...
  .. ..$ Result2         : num [1:10] 1.305 2.287 -1.389 -0.279 -0.133 ...
  ..$ Data_frame_2 :'data.frame':   10 obs. of  4 variables:
  .. ..$ Sample_ID       : Factor w/ 10 levels "1","2","3","4",..: 10 9 3 6 4 8 5 1 2 7
  .. ..$ Reported_Analyte: Factor w/ 10 levels "C","F","I","J",..: 6 7 10 2 5 8 9 1 3 4
  .. ..$ Date_Collected  : Date[1:10], format: "2012-05-02" "2012-05-03" ...
  .. ..$ Result2         : num [1:10] 1.305 2.287 -1.389 -0.279 -0.133 ...
....

If you now wish to proceed with collapsing the individual data frames into a single data frame, but retaining the information about which data frame they came from, we would do this for mylist2:

all2 <- do.call("rbind", mylist2)
all2 <- transform(all2, id = rep(names(mylist2), sapply(mylist2, nrow)))
rownames(all2) <- seq_len(nrow(all2)) ## reset rownames for compactness

which gives:

R> head(all2)
  Sample_ID Reported_Analyte Date_Collected    Result2           id
1        10                L     2012-05-02  1.3048697 Data_frame_1
2         9                R     2012-05-03  2.2866454 Data_frame_1
3         3                W     2012-05-04 -1.3888607 Data_frame_1
4         6                F     2012-05-05 -0.2787888 Data_frame_1
5         4                K     2012-05-06 -0.1333213 Data_frame_1
6         8                T     2012-05-07  0.6359504 Data_frame_1

For mylist we use something very similar, but just index into mylist using [[1]]:

all1 <- do.call("rbind", mylist[[1]])
all1 <- transform(all1, id = rep(names(mylist[[1]]), sapply(mylist[[1]], nrow)))
rownames(all1) <- seq_len(nrow(all1)) ## reset rownames for compactness

R> head(all1)
  Sample_ID Reported_Analyte Date_Collected    Result2           id
1        10                L     2012-05-02  1.3048697 Data_frame_1
2         9                R     2012-05-03  2.2866454 Data_frame_1
3         3                W     2012-05-04 -1.3888607 Data_frame_1
4         6                F     2012-05-05 -0.2787888 Data_frame_1
5         4                K     2012-05-06 -0.1333213 Data_frame_1
6         8                T     2012-05-07  0.6359504 Data_frame_1

As you can see repeatedly having to refer to your list of data frames as mylist[[1]] is a pain if you dont need the outer list.

Update:

If you don't want to collapse the list into a single data frame, see @Andrie's answer, but modify it to read:

ml2 <- ml1
ml2[[1]] <- lapply(seq_along(ml[[1]]), function(x)cbind(ml[[1]][[x]], id=x))

so you account for the list within list structure.

add variable to a list in R

Answers (2)

Related Questions