Reputation:
Really elementary question but I can't get this to work. My sample text is provided in the bottom of the page.
The only row I want left is the ones looking like this: "178-207 30 WVRTRWALLLLFWLGWLGMLAGAVVIIVRA -3,95". I currently use TextWrangler on OSX (terminal and me are not friends) which provide regex replacements. I am trying to do this in steps, and my first step is trying to get rid of all the protein sequences.
In TextWrangler, I search for this:
Working sequence([^;]*)------------------------------------------------------------
and replace with nothing. However, what I end up with is almost an empty document, as TextWrangler seems to find the first instance of "Working sequence", but the LAST instance of "------------------------------------------------------------". How do I change so this is a step-wise process, finding the first instances of both and replacing with nothing, then the second instance etc?
Thanks and greetings from Sweden
Results summary for protein: sp|P08195|4F2_HUMAN 4F2 GN=SLC3A2 PE=1 SV=3 Translocon TM Analysis Results Partitioning: water to bilayer Window range: 19-30
Number of translocon TM predicted segments: 2
178-207 30 WVRTRWALLLLFWLGWLGMLAGAVVIIVRA -3,95
438-460 23 ARLLTSFLPAQLLRLYQLMLFTL 1,63
Working sequence length = 630):
MELQPPEASIAVVSIPRQLPGShSEAGVQGLSAGDDSELGShCVAQTGLELLASGDPLPS ASQNAEMIETGSDCVTQAGLQLLASSDPPALASKNAEVTGTMSQDTEVDMKEVELNELEP EKQPMNAASGAAMSLAGAEKNGLVKIKVAEDEAEAAAAAKFTGLSKEELLKVAGSPGWVR TRWALLLLFWLGWLGMLAGAVVIIVRAPRCRELPAQKWWhTGALYRIGDLQAFQGhGAGN LAGLKGRLDYLSSLKVKGLVLGPIhKNQKDDVAQTDLLQIDPNFGSKEDFDSLLQSAKKK SIRVILDLTPNYRGENSWFSTQVDTVATKVKDALEFWLQAGVDGFQVRDIENLKDASSFL AEWQNITKGFSEDRLLIAGTNSSDLQQILSLLESNKDLLLTSSYLSDSGSTGEhTKSLVT QYLNATGNRWCSWSLSQARLLTSFLPAQLLRLYQLMLFTLPGTPVFSYGDEIGLDAAALP GQPMEAPVMLWDESSFPDIPGAVSANMTVKGQSEDPGSLLSLFRRLSDQRSKERSLLhGD FhAFSAGPGLFSYIRhWDQNERFLVVLNFGDVGLSAGLQASDLPASASLPAKADLLLSTQ PGREEGSPLELERLKLEPhEGLLLRFPYAA
Results summary for protein: sp|Q9NPC4|A4GAT_HUMAN OS=Homo sapiens GN=A4GALT PE=2 SV=1 Translocon TM Analysis Results Partitioning: water to bilayer Window range: 19-30
Number of translocon TM predicted segments: 1
19-43 25 RVCTLFIIGFKFTFFVSIMIYWhVV -1,04
Working sequence length = 353):
MSKPPDLLLRLLRGAPRQRVCTLFIIGFKFTFFVSIMIYWhVVGEPKEKGQLYNLPAEIP CPTLTPPTPPShGPTPGNIFFLETSDRTNPNFLFMCSVESAARThPEShVLVLMKGLPGG NASLPRhLGISLLSCFPNVQMLPLDLRELFRDTPLADWYAAVQGRWEPYLLPVLSDASRI ALMWKFGGIYLDTDFIVLKNLRNLTNVLGTQSRYVLNGAFLAFERRhEFMALCMRDFVDh YNGWIWGhQGPQLLTRVFKKWCSIRSLAESRACRGVTTLPPEAFYPIPWQDWKKYFEDIN PEELPRLLSATYAVhVWNKKSQGTRFEATSRALLAQLhARYCPTThEAMKMYL
Upvotes: 1
Views: 65
Reputation: 1110
You told it to look for "Working sequence" and than anything that's not ';' the first (and next and next...) line of '-' characters aren't -. That's why it's matching everything. It does match the final line of '-' characters because you told it there should be one at the end. I think this will work for you
Working sequence([^-]*)------------------------------------------------------------
Upvotes: 1