How to create single table by extracting certain cells from multiple CSV files

Question

I am wondering if it is possible to create a new dataframe with certain cells from each file from the working directory. for example say If I have 2 data frame like this (please ignore the numbers as they are random):

enter image description here

Say in each dataset, row 4 is the sum of my value and Row 5 is number of missing values. If I represent number of missing values as "M" and Sum of coloumns as "N", what I am trying to acheive is the following table:

enter image description here

So each file 'N' and 'M' are in 1 single row.

I have many files in the directory so I have read them in a list, but not sure what would be the best way to perform such task on a list of files.

this is my sample code for the tables I have shown and how I read them in list:

 ##Create sample data

df = data.frame(Type = 'wind', v1=c(1,2,3,100,50), v2=c(4,5,6,200,60), v3=c(6,7,8,300,70))
df2 =data.frame(Type = 'test', v1=c(3,2,1,400,40), v2=c(2,3,4,500,30), v3=c(6,7,8,600,20))

# write to directory
write.csv(df, file = "sample1.csv", row.names = F)
write.csv(df2, file = "sample2.csv", row.names = F)

# read to list
mycsv = dir(pattern=".csv")
n <- length(mycsv) 
 
mylist <- vector("list", n) 
for(i in 1:n) mylist[[i]] <- read.csv(mycsv[i],header = TRUE)

I would be really greatful if you could give me some suggestion about if this possible and how I should approch?

Many thanks,
Ayan

Josh O&#39;Brien · Accepted Answer

This should work:

processFile <- function(File) {
    d <- read.csv(File, skip = 4, nrows = 2, header = FALSE, 
                  stringsAsFactors = FALSE)
    dd <- data.frame(d[1,1], t(unlist(d[-1])))
    names(dd) <- c("ID", "v1N", "V1M", "v2N", "V2M", "v3N", "V3M") 
    return(dd)
}

ll <- lapply(mycsv, processFile)
do.call(rbind, ll)
#     ID v1N V1M v2N V2M v3N V3M
# 1 wind 100  50 200  60 300  70
# 2 test 400  40 500  30 600  20

(The one slightly tricky/unusual bit comes in that third line of processFile(). Here's a code snippet that should help you see how it accomplishes what it does.)

(d <- data.frame(a="wind", b=1:2, c=3:4))
#      a b c
# 1 wind 1 3
# 2 wind 2 4
t(unlist(d[-1]))
#      b1 b2 c1 c2
# [1,]  1  2  3  4

How to create single table by extracting certain cells from multiple CSV files

Answers (2)

Related Questions