PsychometStats
PsychometStats

Reputation: 380

R: grepl outputs matches in the wrong order

I have a data frame (called test) with 1 variable and 5,000 rows, where each element is a string.

1. "Am open about my feelings."                   
2. "Take charge."                                 
3. "Talk to a lot of different people at parties."
4. "Make friends easily."                         
5. "Never at a loss for words."                   
6. "Don't talk a lot."                            
7. "Keep in the background."                      
  .....
5000. "Speak softly."           

I am looking to find and output row positions of 3 specific character elements. In this case df object: "Speak softly.", "Take charge.", "Don't talk a lot."

I expect to get the following output;

[1] 5000 2 6 

However, the code that I am currently using for some reason outputs row indices in the ascending order instead of indices ordered as corresponding to their items as above

which(grepl(paste(df, collapse = "|"), test[,1])) 

[1] 2 6 5000 

I am really unsure why this occurs. I tried set grepl-based options, i.e. FIXED or PERL to TRUE in a hope that it would change the result, but it didn't. Also I tried searching for a generic 'reorder' function, but it is doing a very different thing to what is needed here. Finally, I tried removing the which statement, but it simply changes output as binary and produces TRUE, FALSE type output.

EDIT

Thank you everyone for help with the solution.

lapply(big7 , function(p) {
grep(pattern = p, test[ , 1])} ) # correct order of indices  

lapply(big7 , function(p) {
grepl(pattern = p, test[ , 1])} ) #  TRUE/FALSE for each item in the correct order 

Upvotes: 1

Views: 740

Answers (1)

IRTFM
IRTFM

Reputation: 263352

Try this (for the reasons in my comment above (and because grep returns numeric positions):

  sapply( df , function(p) {grep(patt=p, test[ , 1])} )

Upvotes: 4

Related Questions