arandomlypickedname
arandomlypickedname

Reputation: 1369

How do I use elements of a dataframe like hash keys / dictionary keys / primary keys?

I have a dataframe in which I want to use certain values as hash keys / dictionary keys (or whatever you call it in your language of choice) for other values in that dataframe. Say I have a dataframe like this which I've read in from a large csv file (only first row shown):

  Plate.name QN.number Well Allele.X.Rn Allele.Y.Rn Call
1 Plate 1_A1    QN2200   A     1.766       2.791    Both 

which in R code would be:

 structure(list(Plate.name = structure(1L, .Label = "Plate 1_A1", class = "factor"), 
    QN.number = structure(1L, .Label = "QN2200", class = "factor"), 
    Well = structure(1L, .Label = "A1", class = "factor"), Allele.X.Rn = 1.766, 
    Allele.Y.Rn = 2.791, Call = structure(1L, .Label = "Both", class = "factor")), .Names = c("Plate.name", 
"QN.number", "Well", "Allele.X.Rn", "Allele.Y.Rn", "Call"), class = "data.frame", row.names = c(NA, 
-1L))

THe QN.numbers are unique IDs in my dataset. How do I then retrieve data using the QN.number as a reference for the other values, that is to say I want to know the Call or the Allele.X.Rn for a given QN.number? It seems row.names might do the trick but then how would I use them in this instance?

Upvotes: 10

Views: 4241

Answers (3)

Spacedman
Spacedman

Reputation: 94307

Using row.names is like this:

> row.names(d)=d$QN.number
> d["QN2200",]
       Plate.name QN.number Well Allele.X.Rn Allele.Y.Rn Call
QN2200 Plate 1_A1    QN2200   A1       1.766       2.791 Both
> d["QN2201",]
   Plate.name QN.number Well Allele.X.Rn Allele.Y.Rn Call
NA       <NA>      <NA> <NA>          NA          NA <NA>

You just use the row name as the first parameter in the subsetting. You can also use multiple row names:

> d=data.frame(a=letters[1:10],b=runif(10))
> row.names(d)=d$a
> d[c("a","g","d"),]
  a         b
a a 0.6434431
g g 0.6724661
d d 0.9826392

Now I'm not sure how clever this is, and whether it does sequential search for each row name or faster indexing...

Upvotes: 5

Richie Cotton
Richie Cotton

Reputation: 121167

Use subset.

 subset(your_data, QN.number == "QN2200", Allele.X.Rn)

with provides an alternative; here the output is a vector rather than another data frame.

with(your_data, Allele.X.Rn[QN.number == "QN2200"])

Upvotes: 4

user554546
user554546

Reputation:

Assuming that we're storing our data frame in a variable name--I'll call it dataframe for now--the following should do it:

dataframe$Allele.X.Rn[which(dataframe$Qn.number == <whatever>)]

Where, of course <whatever> is the number that you'd like to use for Qn.number.

Upvotes: 1

Related Questions