Anuj Gupta
Anuj Gupta

Reputation: 37

How do read file by filtering rows based on a condition in R

I am using R to read a csv. But I do not want whole dataset in memory as dataset is too large. I need to read rows based on one column's category.

I want to read only rows where col2 = 'A'

Example : col1 col2 col3
1 A 1000
2 B 2000
3 A 1000
4 A 2000
5 A 1000
6 B 2000

Upvotes: 3

Views: 4633

Answers (3)

Severin Pappadeux
Severin Pappadeux

Reputation: 20080

You could try to use fread from data.table package with cmd option. From documentation:

A shell command that pre-processes the file; e.g. fread(cmd=paste("grep",word,"filename"). See Details.

Shell commands:

fread accepts shell commands for convenience. The input command is run and its output written to a file in tmpdir (link{tempdir}() by default) to which fread is applied "as normal". The details are platform dependent -- system is used on UNIX environments, shell otherwise; see system.

So if you run something like

library(data.table)
t <- fread(......., cmd=paste("grep","' A '","filename"), .....)

then it filters lines which contains A (A surrounded by spaces) and then apply fread to the result.

Upvotes: 7

KHOKHAR
KHOKHAR

Reputation: 1

One of these should solve the issue:

fread(file=file_name, select=col_names)[specific_col_name %in% ID_name] 

or

fread(file=file_name, select=col_names)[grep(pattern, specific_col_name, ignore.case = TRUE)] 

Upvotes: 0

akrun
akrun

Reputation: 886948

We could use sqldf

library(sqldf)
df1 <- read.csv.sql("file.csv", "select *, from file where col2 = 'A'", sep=",")

Upvotes: 2

Related Questions