Sudharshan Ravi
Sudharshan Ravi

Reputation: 33

Using grep in a nested for loop

I am trying to automize one of the simulation. I have two sets of data. One is the Subject ID's of patients (187 rows long), the other is the sample ID (3057 rows long). I would like to classify the sample ID's based on the Subject.

For eg: The sub ID = ABCD. The samples takes from the subject will be ABCD-0001, ABCD-0002 and so.

Now I am trying to use grep to search through every element in sub ID and see if its a subset of the sample ID. and if so, then the value it returns could be inserted into a new vector, with the row of the new vector denoted by the value returned from grep [Same as the row number in Sample ID] and the value would be same as the row number in Subject ID

As in

SubID       SampID

ABCD        ABCD-0001
EFGH        ABCD-0002   
IJKL        IJKL-0001
            IJKL-0002
            EFGH-0001
            EFGH-0002
            EFGH-0003

Desired Output

Numeric ID
1
1
3
3
2
2
2

I am using this code

j = 1:nrow(SubID)
i = 1:nrow(SampID)

for (val in j)
{
  for(val in i)
{
    if (length(k<-grep(SubID[j,1],SampID[i,1]))>0)
    {
      l=as.numeric(unlist(k))
      Ind[l]=j
    }
  }
}

Upvotes: 0

Views: 359

Answers (1)

R. Schifini
R. Schifini

Reputation: 9313

There are ways to solve this without using a for-loop

Data:

a = data.frame(subID = c("ab","cd","de"))
b = data.frame(SampID = c("ab-1","ab-2","de-1","de-2","cd-1","cd-2","cd-3"))

> a
  subID
1    ab
2    cd
3    de

> b
  SampID
1   ab-1
2   ab-2
3   de-1
4   de-2
5   cd-1
6   cd-2
7   cd-3

To obtain the corresponding index, first obtain the substring of the first two elements (in my example! In yours should go from 1 to 4, if all have 4 letters!)

f = substr(b$SampID,1,2)
b$num = sapply(f,function(x){which(x==a)})

Which gives:

> b
  SampID num
1   ab-1   1
2   ab-2   1
3   de-1   3
4   de-2   3
5   cd-1   2
6   cd-2   2
7   cd-3   2

Edit: Different letter lengths

If you have different lengths of letters in a, then you can do it with only one for loop. Try this

a = data.frame(subID = c("ab","cd","def"))
b = data.frame(SampID = c("ab-1","ab-2","def-1","def-2","cd-1","cd-2","cd-3"))

b$num = 0
for (k in 1:length(a$subID)){
    b$num[grepl( pattern = a$subID[k] , x = b$SampID)] = k
}

In this case loop through every element of a and use grepl to determine those SampID that have this pattern. Assign the loop number to those that return true.

New Results:

> b
  SampID num
1   ab-1   1
2   ab-2   1
3  def-1   3
4  def-2   3
5   cd-1   2
6   cd-2   2
7   cd-3   2

Upvotes: 2

Related Questions