Amateur
Amateur

Reputation: 1277

add variable to a list in R

I have 28 list within a list and I try to add another variable called ID to each individual list. I found this Dataframes in a list; adding a new variable with name of dataframe to be very helpful. But when i tried his code, it doesn't work in my case. I think it's because my list doesn't have clear labels [1],[2].[3], etc.. that the code can recognize.

all$id <- rep(names(mylist), sapply(mylist, nrow))


>List of 1
$ :List of 28

  ..$ :'data.frame':    271 obs. of  12 variables:


  .. ..$ Sample_ID                 : Factor w/ 271 levels "MC25",..: 19 27 2

  .. ..$ Reported_Analyte          : Factor w/ 10 levels "2-Butoxyethanol",..: 7 7 7 

 ..$ Date_Collected            : Factor w/ 71 levels "2010-05-08","2010-05-09",..: 8 9 1

  .. ..$ Result2                   : num [1:271] 0.11 0.11 0.11 0.11 

  ..$ :'data.frame':    6 obs. of  12 variables:


  .. ..$ Sample_ID                 : Factor w/ 271 levels "MC25",..: 19 27 2

  .. ..$ Reported_Analyte          : Factor w/ 10 levels "2-Butoxyethanol",..: 7 7 7 

 ..$ Date_Collected            : Factor w/ 71 levels "2010-05-08","2010-05-09",..: 8 9 1

  .. ..$ Result2                   : num [1:271] 0.11 0.11 0.11 0.11 

Upvotes: 1

Views: 6475

Answers (2)

Gavin Simpson
Gavin Simpson

Reputation: 174788

It really isn't very clear what you want to achieve (the post you linked to was about collapsing over the list of data frames and adding into the collapsed version an ID variable indicating which original data frame each row in the collapsed data frame came from).

I see a complication with your data; you have a list of 28 data frames within a list. You can see that in the output from str() that is given in your Q. You can see this better with this example data set (here all the data frames are the same but that is just for expedience)

set.seed(42)
dat <- data.frame(Sample_ID = factor(sample(10)),
                  Reported_Analyte = factor(sample(LETTERS, 10)),
                  Date_Collected = Sys.Date() + 0:9,
                  Result2 = rnorm(10))

mylist <- list(lapply(1:28, function(x) dat))

If we look at mylist using str() we see the nature of the complication I mentioned:

R> str(mylist, max = 2)
List of 1
 $ :List of 28
  ..$ Data_frame_ 1 :'data.frame':  10 obs. of  4 variables:
  ..$ Data_frame_ 2 :'data.frame':  10 obs. of  4 variables:
  ..$ Data_frame_ 3 :'data.frame':  10 obs. of  4 variables:
  ..$ Data_frame_ 4 :'data.frame':  10 obs. of  4 variables:
  ..$ Data_frame_ 5 :'data.frame':  10 obs. of  4 variables:
  ..$ Data_frame_ 6 :'data.frame':  10 obs. of  4 variables:
  ..$ Data_frame_ 7 :'data.frame':  10 obs. of  4 variables:
....<etc>

Where the post you linked to was starting from was the list inside your outer list and that list had named components. If you don't need the outer list, perhaps best to throw it away at this stage:

mylist2 <- mylist[[1]]
## the `[[` are important as we want the 1st component *inside* the list
## using `[` would just give us a list within a list again.

Names can then be added to this list

names(mylist2) <- paste("Data_frame_", seq_along(mylist2), sep = "")

which would result in

R> str(mylist2)
List of 28
 $ Data_frame_1 :'data.frame':  10 obs. of  4 variables:
  ..$ Sample_ID       : Factor w/ 10 levels "1","2","3","4",..: 10 9 3 6 4 8 5 1 2 7
  ..$ Reported_Analyte: Factor w/ 10 levels "C","F","I","J",..: 6 7 10 2 5 8 9 1 3 4
  ..$ Date_Collected  : Date[1:10], format: "2012-05-02" "2012-05-03" ...
  ..$ Result2         : num [1:10] 1.305 2.287 -1.389 -0.279 -0.133 ...
 $ Data_frame_2 :'data.frame':  10 obs. of  4 variables:
  ..$ Sample_ID       : Factor w/ 10 levels "1","2","3","4",..: 10 9 3 6 4 8 5 1 2 7
  ..$ Reported_Analyte: Factor w/ 10 levels "C","F","I","J",..: 6 7 10 2 5 8 9 1 3 4
  ..$ Date_Collected  : Date[1:10], format: "2012-05-02" "2012-05-03" ...
  ..$ Result2         : num [1:10] 1.305 2.287 -1.389 -0.279 -0.133 ...
....<etc>

Notice the List of 1 is no longer reported.

If the list of data frames within a list is important to you (not sure why it would be, but OK), then you can assign the names to the [[1]]st component directly.

names(mylist[[1]]) <- paste("Data_frame_", seq_along(mylist[[1]]), sep = "")

(Notice I'm using the original mylist and on both occasions I index that list with [[1]].)

The result is similar to the above though the list within a list structure is retained:

R> str(mylist)
List of 1
 $ :List of 28
  ..$ Data_frame_1 :'data.frame':   10 obs. of  4 variables:
  .. ..$ Sample_ID       : Factor w/ 10 levels "1","2","3","4",..: 10 9 3 6 4 8 5 1 2 7
  .. ..$ Reported_Analyte: Factor w/ 10 levels "C","F","I","J",..: 6 7 10 2 5 8 9 1 3 4
  .. ..$ Date_Collected  : Date[1:10], format: "2012-05-02" "2012-05-03" ...
  .. ..$ Result2         : num [1:10] 1.305 2.287 -1.389 -0.279 -0.133 ...
  ..$ Data_frame_2 :'data.frame':   10 obs. of  4 variables:
  .. ..$ Sample_ID       : Factor w/ 10 levels "1","2","3","4",..: 10 9 3 6 4 8 5 1 2 7
  .. ..$ Reported_Analyte: Factor w/ 10 levels "C","F","I","J",..: 6 7 10 2 5 8 9 1 3 4
  .. ..$ Date_Collected  : Date[1:10], format: "2012-05-02" "2012-05-03" ...
  .. ..$ Result2         : num [1:10] 1.305 2.287 -1.389 -0.279 -0.133 ...
....<etc>

If you now wish to proceed with collapsing the individual data frames into a single data frame, but retaining the information about which data frame they came from, we would do this for mylist2:

all2 <- do.call("rbind", mylist2)
all2 <- transform(all2, id = rep(names(mylist2), sapply(mylist2, nrow)))
rownames(all2) <- seq_len(nrow(all2)) ## reset rownames for compactness

which gives:

R> head(all2)
  Sample_ID Reported_Analyte Date_Collected    Result2           id
1        10                L     2012-05-02  1.3048697 Data_frame_1
2         9                R     2012-05-03  2.2866454 Data_frame_1
3         3                W     2012-05-04 -1.3888607 Data_frame_1
4         6                F     2012-05-05 -0.2787888 Data_frame_1
5         4                K     2012-05-06 -0.1333213 Data_frame_1
6         8                T     2012-05-07  0.6359504 Data_frame_1

For mylist we use something very similar, but just index into mylist using [[1]]:

all1 <- do.call("rbind", mylist[[1]])
all1 <- transform(all1, id = rep(names(mylist[[1]]), sapply(mylist[[1]], nrow)))
rownames(all1) <- seq_len(nrow(all1)) ## reset rownames for compactness

R> head(all1)
  Sample_ID Reported_Analyte Date_Collected    Result2           id
1        10                L     2012-05-02  1.3048697 Data_frame_1
2         9                R     2012-05-03  2.2866454 Data_frame_1
3         3                W     2012-05-04 -1.3888607 Data_frame_1
4         6                F     2012-05-05 -0.2787888 Data_frame_1
5         4                K     2012-05-06 -0.1333213 Data_frame_1
6         8                T     2012-05-07  0.6359504 Data_frame_1

As you can see repeatedly having to refer to your list of data frames as mylist[[1]] is a pain if you dont need the outer list.

Update:

If you don't want to collapse the list into a single data frame, see @Andrie's answer, but modify it to read:

ml2 <- ml1
ml2[[1]] <- lapply(seq_along(ml[[1]]), function(x)cbind(ml[[1]][[x]], id=x))

so you account for the list within list structure.

Upvotes: 3

Andrie
Andrie

Reputation: 179398

I answer this using a constructed example of a list with samples from mtcars.

First, create a list of data frames. Do this by sampling 10 rows from mtcars for each element of the list:

ml <- lapply(1:3, function(x)mtcars[sample(1:32, 10), 1:3])

So, now you have an unnamed list of 3 data frames. Next you want to add an id column. The trick is to use lapply over a sequence of list items using seq_along(ml), and then to cbind your id to each data frame:

ml2 <- lapply(seq_along(ml), function(x)cbind(ml[[x]], id=x))

The results are what you required:

str(ml2)
List of 3
 $ :'data.frame':   10 obs. of  4 variables:
  ..$ mpg : num [1:10] 15 24.4 26 15.8 22.8 21 32.4 17.3 17.8 30.4
  ..$ cyl : num [1:10] 8 4 4 8 4 6 4 8 6 4
  ..$ disp: num [1:10] 301 147 120 351 108 ...
  ..$ id  : int [1:10] 1 1 1 1 1 1 1 1 1 1
 $ :'data.frame':   10 obs. of  4 variables:
  ..$ mpg : num [1:10] 33.9 19.2 24.4 10.4 30.4 22.8 16.4 21.4 15.5 21.5
  ..$ cyl : num [1:10] 4 6 4 8 4 4 8 6 8 4
  ..$ disp: num [1:10] 71.1 167.6 146.7 460 75.7 ...
  ..$ id  : int [1:10] 2 2 2 2 2 2 2 2 2 2
 $ :'data.frame':   10 obs. of  4 variables:
  ..$ mpg : num [1:10] 15.5 21 13.3 21.5 21.4 30.4 21 18.1 30.4 15.2
  ..$ cyl : num [1:10] 8 6 8 4 4 4 6 6 4 8
  ..$ disp: num [1:10] 318 160 350 120 121 ...
  ..$ id  : int [1:10] 3 3 3 3 3 3 3 3 3 3

Upvotes: 3

Related Questions