Reputation: 13
Good morning all
I am still newish to R, and I searched most forums for an answer to my problem (I suspect I am missing out on a crucial keyword somewhere), so apologies if I duplicate a question. My problem is similar to this question, but the answer does not quite work for me.
I have a matrix with 1.7m-odd rows, and at this point 20 columns. For the purposes of this excercise I only need to extract 20 rows from this matrix, but will need to do more than a 1000 later on. I would like to be able to import a list of all the rows I would like to subset into a smaller matrix for further analysis, and keep I hitting my head against the wall.
I have created a smaller matrix with just 2 columns of interest, and set the row names to the animal ID's. The animal ID's are unique. Apologies for the clumsy coding.
EBV<-read.csv(file='bfile.csv', header=F, skip=1, sep=',', col.names=c("animal","anim_name","byear","anim_name_pa","anim_name_ma","sex","wwdir_ebv","wwdir_acc","wwmat_ebv","wwmat_acc","afc_ebv","afc_acc","icp_ebv","icp_acc","shd_ebv","shd_acc","scr_ebv","scr_acc","adg_ebv","adg_acc"))
head(EBV)
tail(EBV)
a<-subset(EBV, select=c(animal))
b<-subset(EBV, select=c(wwdir_ebv,wwdir_acc))
c<-as.numeric(as.character(unlist(a)))
d<-as.numeric(as.character(unlist(b)))
x<-matrix(d, nrow=1708891,ncol=2, byrow=F)
rownames(x)<-c
colnames(x)<-c("wwdir_ebv","wwdir_acc")
head(x)
Results of head(x):
*row.name* wwdir_ebv wwdir_acc
33525056 12.0321 49
33702721 13.6674 46
33791336 6.8078 63
33907452 11.0981 51
33909847 7.4192 67
34165696 8.5039 42
Now what I would like to do is something like this:
EX<-read.csv(file='braz.csv', header=F, sep=',', col.names=c("Ani"))
X<-as.numeric(as.list(unlist(EX)))
z<-subset(x, select=c('X')
Where the "braz.csv" file only contains a single column, for argument's sake, with animals 33701721, 33791336 and 33909847. Extracting the animals one-by-one hasn't been too much of a problem, but typing a 1000 names one-by-one will be eventually.
I don't know it it would be more effective to keep the animalID's in a column of its own though (i.e., make a matrix of 1.7m x 3 instead of 1.7m x 2) and try to subset according to the column "animalID". My biggest concern is that list that I want to import and use for subsetting.
Thanks in advance!
Upvotes: 0
Views: 1762
Reputation: 132864
I don't know why you go to all that trouble of creating matrices instead of using the data.frame returned by read.csv
. Your use of subset
also confuses me (because select
selects columns, but apparently you want to subset by rows).
It appears you simply need x[rownames(x) %in% unlist(EX),]
. Generally, you'll find that [
is not less convenient than subset
for subsetting, but more powerful. subset
can also result in trouble when used inside functions. I'd advise you to study help("[")
. help("%in%")
might also be worth reading.
Upvotes: 1