How to select the same columns from many files

Question

I have many text files in which I want to load them all and then make a new matrix with a certain columns from all files

for example some matrices are as follows:

1a <- replicate(10, rnorm(20)) 
1b <- replicate(10, rnorm(19)) 
2a <- replicate(10, rnorm(18))
2b <- replicate(10, rnorm(15))

how I reconize them, I put them all in a folder and I set my directory there then I can get the list of them like

filelist = list.files(pattern = ".*.txt")

Then I want to put the first column of the 1a and V6 and V7 in a new matrix then I want to put the V6 and V7 from the 1b in a new matrix then I want to put the V6 and V7 from the 2a in a new matrix then I want to put the V6 and V7 from the 2b in a new matrix

The files are not in the same length (their rows are different from each other) . I would like to do two things

1- save the same file with selected columns with an added R to the name for example if the original file is 1a, then select V6 and V7 and same a new file with only 2 columns and name 1aR

2- make a new matrix and put all the selected columns in that (those that are not equal , we can make NA or 0 there

akrun · Accepted Answer

Here is an option to read the files, select the concerned columns from the dataset, and create a new dataset.

We get the files that follow a particular file name pattern in the working directory using list.files.

filelist <- list.files(pattern='\d+[^0-9]+\.txt', full.names=TRUE)

Then, read all the files into a list using either read.csv/read.table or fread from data.table

lst <- lapply(filelist, read.csv, header=TRUE, stringsAsFactors=FALSE)

Extract the 6th and 7th columns from the 'lst'

lst1 <- lapply(lst, "[", c("V6", "V7"))

If the data.frame elements in the list have unequal number of rows, one option is cbind.fill from library(rowr)

library(rowr)
res <- cbind.fill(lst[[1]][1], do.call(cbind.fill, 
           c(lst1, list(fill=NA))), fill=NA)
res 
#   V1 V6 V7 V6.1 V7.1
#1  21  1 11    1   11
#2  22  2 12    2   12
#3  23  3 13    3   13
#4  24  4 14   NA   NA
#5  25  5 15   NA   NA
#6  26  6 16   NA   NA
#7  27  7 17   NA   NA
#8  28  8 18   NA   NA
#9  29  9 19   NA   NA
#10 30 10 20   NA   NA

Then, we write the file as .txt

write.table(res, 'CombinedV6_V7.txt', row.names=FALSE, quote=FALSE)

Update

Using the data from the link

lst <- lapply(filelist, read.csv, sep='	',
              header=TRUE, stringsAsFactors=FALSE)
lst1 <- lapply(lst, "[", c("Time", "X220"))
res <- do.call(cbind.fill, c(lst1, list(fill=NA)))
head(res)
#   Time   X220  Time   X220  Time  X220   Time  X220
#1 0.700    111 1.400   2370 0.850   520  1.600 21216
#2 2.083 131747 1.650 179289 1.633 54607  1.900  3816
#3 2.517  23428 2.100  21690 2.117 13677  2.117  3573
#4 2.667  12528 2.267  10383 2.267 13448  2.300 11349
#5 3.883   1055 3.017    816 3.567  1346  9.717   292
#6 4.500    881 3.383    637 5.350   772 21.600  3774

data

 lst <- list(data.frame(V1=21:30, V6=1:10, V7= 11:20), 
             data.frame(V6=1:3, V7=11:13, V1= 21:23))

NOTE: The above data is just for reproducing the problem.

How to select the same columns from many files

Answers (1)

Update

data

Related Questions