user12
user12

Reputation: 91

Manipulating columns of a list of dataframes in R

I have a list of data frames, I want to add a column to each data frame and this column would be the concatenation of the row number and another variable.

I have managed to do that using a for loop but it is taking a lot of time when dealing with a large dataset, is there a way to avoid a for loop?

my_data_vcf <-lapply(my_vcf_files,read.table, stringsAsFactors = FALSE)
for i in 1:length(my_data_vcf){
        for(j in 1:length(my_data_vcf[[i]]){ 
             my_data_vcf[[i]] <- cbind(my_data_vcf[[i]], "Id" = paste(c(variable,j), collapse = "_"))}}

Upvotes: 0

Views: 1442

Answers (2)

tyluRp
tyluRp

Reputation: 4768

One way we can do this is to create a nested data frame using enframe from the tibble package. Once we've done that, we can unnest the data and use mutate to concatenate the row number and a column:

library(tidyverse)

# using Maurits Evers' data, treating stringsAsFactors
lst <- list(
  data.frame(one = letters[1:10], two = 1:10, stringsAsFactors = F),
  data.frame(one = letters[11:20], two = 11:20, stringsAsFactors = F)
)

lst %>% 
  enframe() %>% 
  unnest(value) %>% 
  group_by(name) %>% 
  mutate(three = paste(row_number(), two, sep = "_")) %>% 
  nest()

Returns:

# A tibble: 2 x 2
   name data             
  <int> <list>           
1     1 <tibble [10 × 3]>
2     2 <tibble [10 × 3]>

If we unnest the data, we can see that var three is the concatenation of var two and the row number:

lst %>% 
  enframe() %>% 
  unnest(value) %>% 
  group_by(name) %>% 
  mutate(three = paste(row_number(), two, sep = "_")) %>% 
  nest() %>% 
  unnest(data)

Returns:

# A tibble: 20 x 4
    name one     two three
   <int> <chr> <int> <chr>
 1     1 a         1 1_1  
 2     1 b         2 2_2  
 3     1 c         3 3_3  
 4     1 d         4 4_4  
 5     1 e         5 5_5  
 6     1 f         6 6_6  
 7     1 g         7 7_7  
 8     1 h         8 8_8  
 9     1 i         9 9_9  
10     1 j        10 10_10
11     2 k        11 1_11 
12     2 l        12 2_12 
13     2 m        13 3_13 
14     2 n        14 4_14 
15     2 o        15 5_15 
16     2 p        16 6_16 
17     2 q        17 7_17 
18     2 r        18 8_18 
19     2 s        19 9_19 
20     2 t        20 10_20

Upvotes: 0

Maurits Evers
Maurits Evers

Reputation: 50738

You can use lapply; since you don't provide a minimal sample dataset, I'm generating some sample data.

# Sample list of data.frame's
lst  <- list(
    data.frame(one = letters[1:10], two = 1:10),
    data.frame(one = letters[11:20], two = 11:20))

# Concatenate row number with entries in second column
lapply(lst, function(x) { x$three <- paste(1:nrow(x), x$two, sep = "_"); x })
#[1]]
#   one two three
#1    a   1   1_1
#2    b   2   2_2
#3    c   3   3_3
#4    d   4   4_4
#5    e   5   5_5
#6    f   6   6_6
#7    g   7   7_7
#8    h   8   8_8
#9    i   9   9_9
#10   j  10 10_10
#
#[[2]]
#   one two three
#1    k  11  1_11
#2    l  12  2_12
#3    m  13  3_13
#4    n  14  4_14
#5    o  15  5_15
#6    p  16  6_16
#7    q  17  7_17
#8    r  18  8_18
#9    s  19  9_19
#10   t  20 10_20    

Upvotes: 3

Related Questions