Tundrahorse
Tundrahorse

Reputation: 13

Extract rows from a matrix using a list

Good morning all

I am still newish to R, and I searched most forums for an answer to my problem (I suspect I am missing out on a crucial keyword somewhere), so apologies if I duplicate a question. My problem is similar to this question, but the answer does not quite work for me.

I have a matrix with 1.7m-odd rows, and at this point 20 columns. For the purposes of this excercise I only need to extract 20 rows from this matrix, but will need to do more than a 1000 later on. I would like to be able to import a list of all the rows I would like to subset into a smaller matrix for further analysis, and keep I hitting my head against the wall.

I have created a smaller matrix with just 2 columns of interest, and set the row names to the animal ID's. The animal ID's are unique. Apologies for the clumsy coding.

EBV<-read.csv(file='bfile.csv', header=F, skip=1, sep=',', col.names=c("animal","anim_name","byear","anim_name_pa","anim_name_ma","sex","wwdir_ebv","wwdir_acc","wwmat_ebv","wwmat_acc","afc_ebv","afc_acc","icp_ebv","icp_acc","shd_ebv","shd_acc","scr_ebv","scr_acc","adg_ebv","adg_acc"))
head(EBV)
tail(EBV)
a<-subset(EBV, select=c(animal))
b<-subset(EBV, select=c(wwdir_ebv,wwdir_acc))
c<-as.numeric(as.character(unlist(a)))
d<-as.numeric(as.character(unlist(b)))
x<-matrix(d, nrow=1708891,ncol=2, byrow=F)
rownames(x)<-c
colnames(x)<-c("wwdir_ebv","wwdir_acc")
head(x)

Results of head(x):

*row.name* wwdir_ebv wwdir_acc
33525056   12.0321        49
33702721   13.6674        46
33791336    6.8078        63
33907452   11.0981        51
33909847    7.4192        67
34165696    8.5039        42

Now what I would like to do is something like this:

EX<-read.csv(file='braz.csv', header=F, sep=',', col.names=c("Ani"))
X<-as.numeric(as.list(unlist(EX)))
z<-subset(x, select=c('X')

Where the "braz.csv" file only contains a single column, for argument's sake, with animals 33701721, 33791336 and 33909847. Extracting the animals one-by-one hasn't been too much of a problem, but typing a 1000 names one-by-one will be eventually.

I don't know it it would be more effective to keep the animalID's in a column of its own though (i.e., make a matrix of 1.7m x 3 instead of 1.7m x 2) and try to subset according to the column "animalID". My biggest concern is that list that I want to import and use for subsetting.

Thanks in advance!

Upvotes: 0

Views: 1762

Answers (1)

Roland
Roland

Reputation: 132864

I don't know why you go to all that trouble of creating matrices instead of using the data.frame returned by read.csv. Your use of subset also confuses me (because select selects columns, but apparently you want to subset by rows).

It appears you simply need x[rownames(x) %in% unlist(EX),]. Generally, you'll find that [ is not less convenient than subset for subsetting, but more powerful. subset can also result in trouble when used inside functions. I'd advise you to study help("["). help("%in%") might also be worth reading.

Upvotes: 1

Related Questions