Reputation: 37
I am using R to read a csv. But I do not want whole dataset in memory as dataset is too large. I need to read rows based on one column's category.
I want to read only rows where col2 = 'A'
Example :
col1 col2 col3
1 A 1000
2 B 2000
3 A 1000
4 A 2000
5 A 1000
6 B 2000
Upvotes: 3
Views: 4633
Reputation: 20080
You could try to use fread
from data.table
package with cmd
option. From documentation:
A shell command that pre-processes the file; e.g. fread(cmd=paste("grep",word,"filename"). See Details.
Shell commands:
fread accepts shell commands for convenience. The input command is run and its output written to a file in tmpdir (link{tempdir}() by default) to which fread is applied "as normal". The details are platform dependent -- system is used on UNIX environments, shell otherwise; see system.
So if you run something like
library(data.table)
t <- fread(......., cmd=paste("grep","' A '","filename"), .....)
then it filters lines which contains A
(A surrounded by spaces) and then apply fread
to the result.
Upvotes: 7
Reputation: 1
One of these should solve the issue:
fread(file=file_name, select=col_names)[specific_col_name %in% ID_name]
or
fread(file=file_name, select=col_names)[grep(pattern, specific_col_name, ignore.case = TRUE)]
Upvotes: 0
Reputation: 886948
We could use sqldf
library(sqldf)
df1 <- read.csv.sql("file.csv", "select *, from file where col2 = 'A'", sep=",")
Upvotes: 2