Reputation: 111
I have many text files in which I want to load them all and then make a new matrix with a certain columns from all files
for example some matrices are as follows:
1a <- replicate(10, rnorm(20))
1b <- replicate(10, rnorm(19))
2a <- replicate(10, rnorm(18))
2b <- replicate(10, rnorm(15))
how I reconize them, I put them all in a folder and I set my directory there then I can get the list of them like
filelist = list.files(pattern = ".*.txt")
Then I want to put the first column of the 1a and V6 and V7 in a new matrix then I want to put the V6 and V7 from the 1b in a new matrix then I want to put the V6 and V7 from the 2a in a new matrix then I want to put the V6 and V7 from the 2b in a new matrix
The files are not in the same length (their rows are different from each other) . I would like to do two things
1- save the same file with selected columns with an added R to the name for example if the original file is 1a, then select V6 and V7 and same a new file with only 2 columns and name 1aR
2- make a new matrix and put all the selected columns in that (those that are not equal , we can make NA or 0 there
Upvotes: 0
Views: 2103
Reputation: 887118
Here is an option to read the files, select the concerned columns from the dataset, and create a new dataset.
We get the files that follow a particular file name pattern in the working directory using list.files
.
filelist <- list.files(pattern='\\d+[^0-9]+\\.txt', full.names=TRUE)
Then, read all the files into a list
using either read.csv/read.table
or fread
from data.table
lst <- lapply(filelist, read.csv, header=TRUE, stringsAsFactors=FALSE)
Extract the 6th and 7th columns from the 'lst'
lst1 <- lapply(lst, "[", c("V6", "V7"))
If the data.frame
elements in the list
have unequal number of rows, one option is cbind.fill
from library(rowr)
library(rowr)
res <- cbind.fill(lst[[1]][1], do.call(cbind.fill,
c(lst1, list(fill=NA))), fill=NA)
res
# V1 V6 V7 V6.1 V7.1
#1 21 1 11 1 11
#2 22 2 12 2 12
#3 23 3 13 3 13
#4 24 4 14 NA NA
#5 25 5 15 NA NA
#6 26 6 16 NA NA
#7 27 7 17 NA NA
#8 28 8 18 NA NA
#9 29 9 19 NA NA
#10 30 10 20 NA NA
Then, we write the file as .txt
write.table(res, 'CombinedV6_V7.txt', row.names=FALSE, quote=FALSE)
Using the data from the link
lst <- lapply(filelist, read.csv, sep='\t',
header=TRUE, stringsAsFactors=FALSE)
lst1 <- lapply(lst, "[", c("Time", "X220"))
res <- do.call(cbind.fill, c(lst1, list(fill=NA)))
head(res)
# Time X220 Time X220 Time X220 Time X220
#1 0.700 111 1.400 2370 0.850 520 1.600 21216
#2 2.083 131747 1.650 179289 1.633 54607 1.900 3816
#3 2.517 23428 2.100 21690 2.117 13677 2.117 3573
#4 2.667 12528 2.267 10383 2.267 13448 2.300 11349
#5 3.883 1055 3.017 816 3.567 1346 9.717 292
#6 4.500 881 3.383 637 5.350 772 21.600 3774
lst <- list(data.frame(V1=21:30, V6=1:10, V7= 11:20),
data.frame(V6=1:3, V7=11:13, V1= 21:23))
NOTE: The above data is just for reproducing the problem.
Upvotes: 1