Reputation: 247
I have a file (called data) with 1000 lines which looks like
ChrNum Base_Position Gene
1 108 NotGene
1 114 Gene
1 160 NotGene
I have similar files in the directory so I wanted to write a function that will go through the file and give me the Base_Position of each gene
I wrote this function to do so
position <- apply(data,1,function(a) {
#go along each row and see if col3 is "Gene"
genes <- data[data[,3]=='Gene',]
#give me the position
genes.up <- genes[,2]
return(genes.up)
})
but when I look at the result it is
> position
#[1] 114 114 114
Where all the rows are filled with the answer I am looking for. I can't get my head around what it is I've done wrong
Upvotes: 0
Views: 192
Reputation: 23216
install.packages("sqldf")
require(sqldf)
position <- sqldf("select * from data where data.Gene = 'Gene' ")
Now you said you had multiple files. There are a number of ways to scale this up, from copy and pasting it to making it a function. It sounds like wrapping it into a function should suit your needs.
Upvotes: 1
Reputation: 413
dplyr works well for these sort of filter and return portions of data frame problems
library(data.table)
library(dplyr)
return_positions <- function(filename) {
data <- fread(filename)
output <- data %>% filter(Gene == 'Gene') %>%
select(Base_Position)
return(output)
}
You should be able scale that function up by passing all file names to the above function.
list_of_output_tables <- sapply(all_filenames, return_positions)
[Edit] To include how to do this for many files. If they are reasonably large I like using fread from data.table to do this.
Upvotes: 1