Reputation: 14361
Doing this in Excel using VBA is straight forward, however for extended data analysis, plotting capability, R has to be used. The instrument's software only outputs .csv files and without headers. It has to be read separately for analysis.
What I am trying to do (need to do):
I already tried how to read .csv using R, store into a data frame, perform analysis on data frame items and plot. This process flow to be repeated daily for each instrument run. I run two instruments. I want to save a code snippet that can be run each time.
The only issue I am facing, I am not sure how to locate those specific rows. The next issue is how to go about output into Excel.
.csv input file format:
V1, V2, V3, V4
numeric (0 decimals), numeric (with 4 decimals), alpha num, alpha num
001, 12.8975, XY03, XY05
...
485, 89.2134, XY02, XY7
Update 1:
Due to some changes with the instrument sw, files are in ASCII format instead of CSV. This brings much limitations. Facing issues installing raster
package to support ASCII. Using the following instead. Installed sqldf
package to support sql. However that gives an error.
> fl1 <- read.table("~/rcodes/dt07042017/fileone.ASC", sep=",")
--read ok
> sv1 <- sqldf('select FNAME "F1", V2, V3, V4 from fl1 where V1 in (101, 201, 301)')
Warning message: Quoted identifiers should have class SQL, use DBI::SQL()
if the caller performs the quoting.
Then installed DBI package. Not sure what it does. The sql worked. This is just one file, I need to read 1000s of files, picked specific row values and saved them into one table/df. Wrote the following broken lines. It fails as I can't find a way to inject file name.
> df <- NULL
> fn <- NULL
> n <- 1
> for (f in files) {
+ fn <- file_path_sans_ext(f)
+ df0 <-read.table(f, sep=",")
+ n <- n + 1
+ df <- sqldf('select n, fn, V1, V2 from df0
where V1 IN (101, 201, 301, 401)')
-- thought R could read fn just as df0
+ df <- rbind(df) --further fn and n only reads two files
+ }
Error in rsqlite_send_query(conn@ptr, statement) :
no such column: n, df1
I am not quite happy using sql on R. Is there any other way in R to do this without using sql packages and get above done?
Instead of getting 2:20 values (18 values) in that column, I want to get the mean of row 2 to 20. How can I do that?
#set working directly to the folder where csv files are located
files <- list.files(pattern='.csv')
m = data.frame()
for (k in 1:length(files)){
+ csv = read.csv(files[k], header = FALSE)
#picking up 2:20 consecutive values, value for row 50,120,150 so on
+ data = csv[c(2:20, 50, 120, 150, 175, 200), c(1,2)]
#-pivot transform col/row- data <- as.data.frame(t(data))
#but that line screwed up the data
#when those selected values are with NA/blanks
data$file = files[k]
+ m = rbind(m, data)
}
Upvotes: 0
Views: 424
Reputation: 24500
I don't quite get what you mean by "ASCII files". Hope that the following is significantly similar to your problem. Say that you have the following files:
File 1.csv
101 0.8117 AFWSK QSZDP
099 0.4594 MDKKK DHVAH
301 0.8730 HMRCR PQSMB
012 0.4705 OHHZL CIKVM
012 0.1335 KKEHQ YFNJU
201 0.2967 XFIPZ PMNYT
301 0.2470 WKLIV TBHZP
101 0.9182 SSEHT QYBUW
101 0.8076 VKRSO BUGJL
301 0.5861 PICFN UFTWC
File 2.csv
201 0.0767 FHWLL FZKQF
301 0.3176 HDBVJ EIUNJ
301 0.9606 LHODD XRYFE
301 0.8327 NKGVT LYTUJ
012 0.8073 KOBXD GBEFA
099 0.9414 ZCOCX ENFAV
101 0.3683 UTBUG RKIQU
012 0.1515 XXJZI MEWVI
012 0.0383 IUKQQ PAIFV
301 0.2304 GKIKJ GEOZT
stored in a directory called path/to/my/dir
. Say that this directory contains also other files, but you know that the ones containing data are names in the [0-9]+.csv
format (i.e. some number, followed by .csv
). Of course you have a big number of them and their names can be different; since the question doesn't contain any hint regarding this, I'm just providing an example, hoping that you might adapt this step to your real problem.
Now:
#you get the name of all files. No manual typing.
filenames<-list.files(path="path/to/my/dir",pattern="[0-9]+\\.csv",full.names=TRUE)
#reading them all
dataFiltered<-sapply(filenames,function(x) {
y<-read.table(x)
y[y[[1]] %in% c(101, 201, 301),]},simplify=FALSE)
finalRes<-cbind(do.call(rbind,dataFiltered),
fileOrigin=rep(names(dataFiltered),vapply(dataFiltered,nrow,1L)))
To get something like:
# V1 V2 V3 V4 fileOrigin
#1 101 0.8117 AFWSK QSZDP 1.csv
#2 301 0.8730 HMRCR PQSMB 1.csv
#3 201 0.2967 XFIPZ PMNYT 1.csv
#4 301 0.2470 WKLIV TBHZP 1.csv
#5 101 0.9182 SSEHT QYBUW 1.csv
#6 101 0.8076 VKRSO BUGJL 1.csv
#7 301 0.5861 PICFN UFTWC 1.csv
#8 201 0.0767 FHWLL FZKQF 2.csv
#9 301 0.3176 HDBVJ EIUNJ 2.csv
#10 301 0.9606 LHODD XRYFE 2.csv
#11 301 0.8327 NKGVT LYTUJ 2.csv
#12 101 0.3683 UTBUG RKIQU 2.csv
#13 301 0.2304 GKIKJ GEOZT 2.csv
The above should work, but could be memory expensive. If you have a big number of files, I suggest you to read them in bunches and apply the above to each bunch. This means that you don't pass all the filenames
object to sapply
, but you can pass a subset of it each time till you finish.
At the end, you rbind
all the finalRes
together.
Upvotes: 1