Reputation: 2773
I have a csv file which I read using the following function:
csvData <- read.csv(file="pf.csv", colClasses=c(NA, NA,"NULL",NA,"NULL",NA,"NULL","NULL","NULL"))
dimnames(csvData)[[2]]<- c("portfolio", "date", "ticker", "quantity")
It reads all lines from that file. But i want to skip some rows from reading. The row should not read if the value of the ticker
-column is: ABT
or ADCT
. Is it possible?
sample of my csv file is as follows:
RUS1000,01/29/1999,21st Centy Ins Group,TW.Z,90130N10,72096,1527.534,0.01,21.188
RUS1000,01/29/1999,3com Corp,COMS,88553510,358764,16861.908,0.16,47.000
RUS1000,01/29/1999,3m Co,MMM,88579Y10,401346,31154.482,0.29,77.625
RUS1000,01/29/1999,A D C Telecommunicat,ADCT,00088630,135114,5379.226,0.05,39.813
RUS1000,01/29/1999,Abbott Labs,ABT,00282410,1517621,70474.523,0.66,46.438
RUS1000,02/26/1999,21st Centy Ins Group,TW.Z,90130N10,72096,1378.836,0.01,19.125
RUS1000,02/26/1999,3com Corp,COMS,88553510,358764,11278.644,0.11,31.438
RUS1000,02/26/1999,3m Co,MMM,88579Y10,402146,29783.938,0.29,74.063
Upvotes: 16
Views: 16219
Reputation: 7879
You can now do this in the readr
package.
library(tidyverse)
csvData <- read_csv(file="pf.csv") %>%
filter(!ticker-column %in% c('ABT','ADCT')
Upvotes: 1
Reputation: 1229
For me the sqldf package's read.csv.sql looked great at first blush. But when I tried to use it, it failed to deal with "NULL" strings. (Others have found this out as well.) Unfortunately, it doesn't support all of read.csv features. So I had to write my own. I am surprised that there isn't a good package for this.
fetchLines=function(inputFile,match,fixed=T,n=100,maxlines=100000){ #inputFile='simple.csv'; match='APPLE';
message('reading:',inputFile)
n=min(n,maxlines)
con <- base::file(inputFile, open = "r",encoding = "UTF-8-BOM")
data=c(readLines(con, n = 1, warn = FALSE))
while (length(oneLine <- readLines(con, n = n, warn = FALSE)) > 0) {
grab=grep(match,oneLine,value=T,fixed=fixed)
if(length(grab)>0){
data=c(data,grab)
if(length(data)>maxlines){
warning("bailing out too many");
return(data);
}
cat('.')
}
}
close(con)
gc()
cat("\n")
data;
}
#To avoid: argument 'object' must deparse to a single character string
fdata=textConnection( fetchLines("datafile.csv",'\\bP58\\b',fixed=F,maxlines = 100000))
df<-read.csv(fdata,header=T,sep=",",na.strings = c('NULL',''),fileEncoding = "UTF-8-BOM",stringsAsFactors = F)
R textConnection: "argument 'object' must deparse to a single character string"
Upvotes: 1
Reputation: 121568
It is better to read all and subset later like suggested in the comment :
csvData [!csvData$ticker %in% c('ADCT','ABT'),]
EDIT
You can use fread
from data.table
package for more efficient method to read your file.
library(read.table)
fread(file="pf.csv")
Upvotes: 2
Reputation: 7130
It is possible using sqldf package, using read.csv.sql
Lets say the contents of sample.csv
looks like this:
id,name,age
1,"a",23
2,"b",24
3,"c",23
Now to read only rows where age=23:
require(sqldf)
df <- read.csv.sql("sample.csv", "select * from file where age=23")
df
id name age
1 1 "a" 23
2 3 "c" 23
It is possible to select necessary columns:
df <- read.csv.sql("sample.csv", "select id, name from file where age=23")
df
id name
1 1 "a"
2 3 "c"
Upvotes: 26