Reputation: 1369
I have a dataframe in which I want to use certain values as hash keys / dictionary keys (or whatever you call it in your language of choice) for other values in that dataframe. Say I have a dataframe like this which I've read in from a large csv file (only first row shown):
Plate.name QN.number Well Allele.X.Rn Allele.Y.Rn Call
1 Plate 1_A1 QN2200 A 1.766 2.791 Both
which in R code would be:
structure(list(Plate.name = structure(1L, .Label = "Plate 1_A1", class = "factor"),
QN.number = structure(1L, .Label = "QN2200", class = "factor"),
Well = structure(1L, .Label = "A1", class = "factor"), Allele.X.Rn = 1.766,
Allele.Y.Rn = 2.791, Call = structure(1L, .Label = "Both", class = "factor")), .Names = c("Plate.name",
"QN.number", "Well", "Allele.X.Rn", "Allele.Y.Rn", "Call"), class = "data.frame", row.names = c(NA,
-1L))
THe QN.numbers are unique IDs in my dataset. How do I then retrieve data using the QN.number as a reference for the other values, that is to say I want to know the Call or the Allele.X.Rn for a given QN.number? It seems row.names might do the trick but then how would I use them in this instance?
Upvotes: 10
Views: 4241
Reputation: 94307
Using row.names is like this:
> row.names(d)=d$QN.number
> d["QN2200",]
Plate.name QN.number Well Allele.X.Rn Allele.Y.Rn Call
QN2200 Plate 1_A1 QN2200 A1 1.766 2.791 Both
> d["QN2201",]
Plate.name QN.number Well Allele.X.Rn Allele.Y.Rn Call
NA <NA> <NA> <NA> NA NA <NA>
You just use the row name as the first parameter in the subsetting. You can also use multiple row names:
> d=data.frame(a=letters[1:10],b=runif(10))
> row.names(d)=d$a
> d[c("a","g","d"),]
a b
a a 0.6434431
g g 0.6724661
d d 0.9826392
Now I'm not sure how clever this is, and whether it does sequential search for each row name or faster indexing...
Upvotes: 5
Reputation: 121167
Use subset
.
subset(your_data, QN.number == "QN2200", Allele.X.Rn)
with
provides an alternative; here the output is a vector rather than another data frame.
with(your_data, Allele.X.Rn[QN.number == "QN2200"])
Upvotes: 4
Reputation:
Assuming that we're storing our data frame in a variable name--I'll call it dataframe
for now--the following should do it:
dataframe$Allele.X.Rn[which(dataframe$Qn.number == <whatever>)]
Where, of course <whatever>
is the number that you'd like to use for Qn.number
.
Upvotes: 1