Selectively reading CSV file in r

Question

I have a large data file in CSV format of which I need to import only certain rows .Let's call this large file A.csv.

I have another csv file i.e. B.csv which has two columns and some rows.

Now I need to import only those rows of data from A.csv which have first two column values same as the B.csv column values for a certain row .So,I tried this after importing both files

But it seems to take forever

while(count<4632)
{
    count=count+1
    count2=0
    while(count2<17415)
    {
        count2=count 2+1
        if(B[count,1]==A[count2,1])
            dbase[count,]=A[count2,]
    }
}

Please help!!

Jasper · Accepted Answer

You have two nested big loops, and you are dynamically growing a vector. Both are bad for performance. Try to vectorise both operations.

For example:

set.seed(123)
dfA <- data.frame(
    a = sample(LETTERS, 10000, TRUE),
    b = sample(LETTERS[1:3], 10000, TRUE),
    c = rnorm( 10000 ),
    stringsAsFactors = FALSE
)
dfB <- data.frame(
    a = sample(LETTERS, 1000, TRUE),
    b = sample(LETTERS[1:3], 1000, TRUE),
    stringsAsFactors = FALSE
)

dfC <- dfA[ which( paste(dfA$a, dfA$b) %in% paste(dfB$a, dfB$a)), ]

Selectively reading CSV file in r

Answers (2)

Related Questions