Finding protein motifs and its position in Python

Question

I am a newbie in Python learning. I want to identify a motif sequence in a large protein data set. Using the one-line code mentioned below, I was able to identify proteins that I am interested in. However, I also want the start and end position of the motif in these proteins. It will be helpful if someone can suggest what additional arguments I have to use along with the below-mentioned code. thank you in advance.

import re
df.loc[df ['Protein_sequence'].str.contains ("WA[T]R",regex=True)]

Protein_name    Protein_sequence
242 >PST130_487694  MLRFFRLAALVLLMTSWEVAGDTYDPKTKTTYFGCHKNVDAVCSEP...
358 >Pucstr1_10722  MLRFFRSIALVWLMASWEVSTAGKYPNNPDPVNGAKYFGCHKNVDA...
475 >Pucstr1_2774   MLRFLILTALVLLVASWQVTDTLSQDPGDILFWCHKNVDAVCSETI...

Finding protein motifs and its position in Python

Answers (1)

Related Questions