Reputation: 5
I'm trying to create a method that will split a protein sequence based on two characters: R
and K
.
My code splits the protein sequence correctly, but then removes either R
or K
. I need the program to be able to preserve the delimiters used for splitting the string.
Example:
Lets say I have a protein sequence = GLSDEWQKFEGREGKFWER
My program will then cut the sequence after there is R
or K
.
It should end up like this:
GLSDEWQK
FEGR
EGK
FWER
My code:
Dim protein As String = "GLSDEWQKFEGREGKFWER"
Dim words As String() = protein.Split(New Char() {"R", "K"})
For Each word As String In words
Console.WriteLine(word)
Next
I am writing this code in Visual Basic .NET Framework 4.7.2 and I want to display results in terminal console.
Upvotes: 0
Views: 416
Reputation: 25013
You can use a RegEx.Split to include the items it was split on, then join the resulting array in pairs:
Dim protein As String = "GLSDEWQKFEGREGKFWER"
Dim splitter = New Regex("([KR])")
Dim wordParts = splitter.Split(protein)
' wordParts is now ("GLSDEWQ", "K", "FEG", "R", "EG", "K", "FWE", "R", "")
' join the wordParts in pairs
Dim words As New List(Of String)
For i = 0 To wordParts.Length - 2 Step 2
words.Add(wordParts(i) & wordParts(i + 1))
Next
' if there was an odd number of parts, the last one needs to be added
If wordParts.Count Mod 2 = 1 AndAlso Not String.IsNullOrEmpty(wordParts.Last) Then
words.Add(wordParts.Last)
End If
Console.WriteLine(String.Join(vbCrLf, words))
Outputs:
GLSDEWQK
FEGR
EGK
FWER
The [KR]
is a character group - it'll match on any of the characters in that, and the parentheses ( )
surrounding it make it capture what it matched on.
Upvotes: 1
Reputation: 32223
String.Split()
removes the splitter(s) from the resulting array of strings, but you of course want to preserve the full content.
You could loop the chars in the protein
string (a string is a collection of chars), test the current char to see if it belongs to the array of chars, {"R"c, "K"c}
, that cause the string to split.
If it doesn't, append the current char to a StringBuilder.
If it does, add the accumulated chars to a List(Of String)
, which will contain the results when the loop terminates.
You should have all the Imports
statements already available in your Project. In case you don't add:
Imports System.Linq
Imports System.Text
Dim protein As String = "GLSDEWQKFEGREGKFWER"
Dim splitChars = {"R"c, "K"c}
Dim sb As New StringBuilder()
Dim splitResult As New List(Of String)
For Each c As Char In protein
sb.Append(c)
' If the current char is one of the splitters, add the buffer to the
' results and clear the buffer
If splitChars.Contains(c) Then
splitResult.Add(sb.ToString())
sb.Clear()
End If
Next
' Take the remainder, if any
If sb.Length > 0 Then splitResult.Add(sb.ToString())
Print the list of parts as:
For Each section As String In splitResult
Console.WriteLine(section)
Next
Upvotes: 1