user3816990
user3816990

Reputation: 247

Filtering rows in R

I have a file (called data) with 1000 lines which looks like

ChrNum Base_Position Gene    
1    108 NotGene     
1    114 Gene     
1    160 NotGene 

I have similar files in the directory so I wanted to write a function that will go through the file and give me the Base_Position of each gene

I wrote this function to do so

position <- apply(data,1,function(a) {
        #go along each row and see if col3 is "Gene"
    genes <- data[data[,3]=='Gene',]
        #give me the position
    genes.up <- genes[,2]
        return(genes.up)
})

but when I look at the result it is

> position                                       
#[1] 114 114 114

Where all the rows are filled with the answer I am looking for. I can't get my head around what it is I've done wrong

Upvotes: 0

Views: 192

Answers (2)

Hack-R
Hack-R

Reputation: 23216

install.packages("sqldf")
require(sqldf)
position <- sqldf("select * from data where data.Gene = 'Gene' ")

Now you said you had multiple files. There are a number of ways to scale this up, from copy and pasting it to making it a function. It sounds like wrapping it into a function should suit your needs.

Upvotes: 1

jraab
jraab

Reputation: 413

dplyr works well for these sort of filter and return portions of data frame problems

library(data.table)
library(dplyr)
return_positions <- function(filename) { 
    data <- fread(filename)
    output <- data %>% filter(Gene == 'Gene') %>% 
         select(Base_Position)
    return(output)
}

You should be able scale that function up by passing all file names to the above function.

list_of_output_tables <- sapply(all_filenames, return_positions)

[Edit] To include how to do this for many files. If they are reasonably large I like using fread from data.table to do this.

Upvotes: 1

Related Questions