MT32
MT32

Reputation: 677

extract the content of a column in a list of data frame R

I have 1900 files. I imported them into R environment.

temp<-list.files(pattern="foodconsumption")

In each files there are 300 columns.

I would like to take only data from column 25, and rbind them all together from one end to another.

I know about lapply, but I do not know how to write the function to extract only column.25 (named V25)

I watched some tutorial online and it uses fucntion (elt).

lapply(temp, function(elt), elt[,25])

but i got this error: Error in [.default(elt, , 25) : incorrect number of dimensions

Any easier way to go about this?

Thanks!

Upvotes: 1

Views: 3241

Answers (2)

Xinlu
Xinlu

Reputation: 140

library(tidyverse)
list.files(pattern = ".csv") %>% 
    map_dfc( ~read_csv(.x) %>%  # map_dfc: column combined to dataframe 
              select(2) # choose columns
           )

Upvotes: 1

Melissa Key
Melissa Key

Reputation: 4551

You may already be aware of this (in which case, I apologize), but the function list.files does exactly that - it's output is a vector of all files meeting the pattern criterion. It doesn't actually import the files. I would set up your procedure as follows.

Note that I'm assuming that you are dealing with .csv files. This should work with appropriate modifications for any text files. Additional packages are needed if they are .xlsx files. If they are .Rdata files, other modifications are needed.

files <- list.files(pattern = "foodconsumption")
result <- sapply(files, function(file) {
   # read in file
   temp <- read.csv(file) # adjustments may be needed for headers, etc.

   # return column 25
   temp[,25]
})

Assuming every file has the same number of rows, the output of this is a matrix with 25 columns and rows equal to the rows in the files. To do the equivalent of rbind, we just take the transpose:

t(result)

If the number of rows is different, the output is a list and the transpose will not work. In that case, you'll need to fill in the missing values:

max_length <- max(sapply(result, length))
result_mat <- sapply(result, function(x) {
  if (length(result) < max_length) c(result, rep(NA, max_length - length(result)))
  else result
})

Note that this implicitly assumes that all the missing data is at the end, and/or that the order of data within each file is irrelevant. If that's not the case, be very careful with creating a matrix here - it may be better to work with the data as a list.

Upvotes: 3

Related Questions