Reputation: 773
I have 50 text files all with multiple words like this
View(file1.txt)
one
two
three
four
cuatro
View(file2)
uno
five
seis
dos
Each file has only one row of words and different lengths. I want to create a dataframe in R that has the content of each file into a column and the column name is the file name.
file1 file2 ...........etc
1 one uno
2 two five
3 three seis
4 four dos
5 cuatro
So far I have loaded all the files into a list like this:
files<- lapply(list.files(pattern = "\\.txt$"),read.csv,header=F)
> class(files)
[1] "list"
df <- data.frame(matrix(unlist(files), ncol= length(files)))
which is definitely close but wrong because there are not holes (and some columns should have more data than others) and its also not automatically naming the columns.
Upvotes: 1
Views: 130
Reputation: 4929
The idea is to get file with the max length, and use that length to complete the others (with fewer lengths) filling up with NA
in order to make it possible to work with multiple vectors.
You can achieve that with different approaches. Here it's a way to do that.
files <- sapply(list.files(pattern = "\\.txt$"), readLines)
max_len <- max(sapply(files_data, length))
df <- data.frame(sapply(seq_along(files), function(i) {
len <- length(files[[i]])
if(len < max_len) {
files[[i]] <- append(files[[i]], rep(NA, max_len - len))
} else {
files[[i]]
}
}))
names(df) <- basename(tools::file_path_sans_ext(names(files)))
Upvotes: 1
Reputation: 56189
Try this, get filenames, read them in, get the maximum number of rows, then extend the number of rows. Finally, convert to data.frame:
f <- list.files(pattern = "\\.txt$", full.names = TRUE)
names(f) <- tools::file_path_sans_ext(basename(f))
res <- lapply(f, read.table)
maxRow <- max(sapply(res, nrow))
data.frame(lapply(res, function(i) i[seq(maxRow), ]))
# file1 file2
# 1 one uno
# 2 two five
# 3 three seis
# 4 four dos
# 5 cuatro <NA>
Upvotes: 2