Reputation: 33
I am trying to automize one of the simulation. I have two sets of data. One is the Subject ID's of patients (187 rows long), the other is the sample ID (3057 rows long). I would like to classify the sample ID's based on the Subject.
For eg: The sub ID = ABCD. The samples takes from the subject will be ABCD-0001, ABCD-0002 and so.
Now I am trying to use grep to search through every element in sub ID and see if its a subset of the sample ID. and if so, then the value it returns could be inserted into a new vector, with the row of the new vector denoted by the value returned from grep [Same as the row number in Sample ID] and the value would be same as the row number in Subject ID
As in
SubID SampID
ABCD ABCD-0001
EFGH ABCD-0002
IJKL IJKL-0001
IJKL-0002
EFGH-0001
EFGH-0002
EFGH-0003
Desired Output
Numeric ID
1
1
3
3
2
2
2
I am using this code
j = 1:nrow(SubID)
i = 1:nrow(SampID)
for (val in j)
{
for(val in i)
{
if (length(k<-grep(SubID[j,1],SampID[i,1]))>0)
{
l=as.numeric(unlist(k))
Ind[l]=j
}
}
}
Upvotes: 0
Views: 359
Reputation: 9313
There are ways to solve this without using a for-loop
Data:
a = data.frame(subID = c("ab","cd","de"))
b = data.frame(SampID = c("ab-1","ab-2","de-1","de-2","cd-1","cd-2","cd-3"))
> a
subID
1 ab
2 cd
3 de
> b
SampID
1 ab-1
2 ab-2
3 de-1
4 de-2
5 cd-1
6 cd-2
7 cd-3
To obtain the corresponding index, first obtain the substring of the first two elements (in my example! In yours should go from 1 to 4, if all have 4 letters!)
f = substr(b$SampID,1,2)
b$num = sapply(f,function(x){which(x==a)})
Which gives:
> b
SampID num
1 ab-1 1
2 ab-2 1
3 de-1 3
4 de-2 3
5 cd-1 2
6 cd-2 2
7 cd-3 2
Edit: Different letter lengths
If you have different lengths of letters in a, then you can do it with only one for loop. Try this
a = data.frame(subID = c("ab","cd","def"))
b = data.frame(SampID = c("ab-1","ab-2","def-1","def-2","cd-1","cd-2","cd-3"))
b$num = 0
for (k in 1:length(a$subID)){
b$num[grepl( pattern = a$subID[k] , x = b$SampID)] = k
}
In this case loop through every element of a and use grepl to determine those SampID that have this pattern. Assign the loop number to those that return true.
New Results:
> b
SampID num
1 ab-1 1
2 ab-2 1
3 def-1 3
4 def-2 3
5 cd-1 2
6 cd-2 2
7 cd-3 2
Upvotes: 2