Reputation: 475
I have some long text (e.g. information about many books) in one string and in one line.
I want to find just ISBN (only number - each number prevents by chars ISBN). I found code how to extract this number on first position. The problem is how to create loop for all text. Can I use it for this example streamreader? Thank you for your answers.
Example:
Sub Main()
Dim getLiteratura As String = "'Author 1. Name of book 1. ISBN 978-80-251-2025-5.', 'Author 2. Name of Book 2. ISBN 80-01-01346.', 'Author 3. Name of book. ISBN 80-85849-83.'"
Dim test As Integer = getLiteratura.IndexOf("ISBN")
Dim getISBN As String = getLiteratura.Substring(test + 5, getLiteratura.IndexOf(".", test + 1) - test - 5)
Console.Write(getISBN)
Console.ReadKey()
End Sub
Upvotes: 3
Views: 15717
Reputation: 43743
Since you can pass the start position into the IndexOf
method, you can loop through the string by starting the search from where the last iteration left off. For instance:
Dim getLiteratura As String = "'Author 1. Name of book 1. ISBN 978-80-251-2025-5.', 'Author 2. Name of Book 2. ISBN 80-01-01346.', 'Author 3. Name of book. ISBN 80-85849-83.'"
Dim isbns As New List(Of String)()
Dim position As Integer = 0
While position <> -1
position = getLiteratura.IndexOf("ISBN", position)
If position <> -1 Then
Dim endPosition As Integer = getLiteratura.IndexOf(".", position + 1)
If endPosition <> -1 Then
isbns.Add(getLiteratura.Substring(position + 5, endPosition - position - 5))
End If
position = endPosition
End If
End While
That would be about as efficient of a method as you are likely to find, if the data is already all loaded into a string. However, that method is not very readable or flexible. If those things concern you more than mere efficiency, you may want to consider using RegEx:
For Each i As Match In Regex.Matches(getLiteratura, "ISBN (?<isbn>.*?)\.")
isbns.Add(i.Groups("isbn").Value)
Next
As you can see, not only is it much easier to read, it is also configurable. You could store the pattern externally in a resource, configuration file, database, etc.
If the data isn't already all loaded into a string, and efficiency is an utmost concern, you may want to look into using a stream reader so that you only load a small subset of the data into memory at once. That logic would be a bit more complicated, but still not overly difficult.
Here's a simple example of how you could do it via a StreamReader
:
Dim isbns As New List(Of String)()
Using reader As StreamReader = New StreamReader(stream)
Dim builder As New StringBuilder()
Dim isbnRegEx As New Regex("ISBN (?<isbn>.*?)\.")
While Not reader.EndOfStream
Dim charValue As Integer = reader.Read()
If charValue <> -1 Then
builder.Append(Convert.ToChar(charValue))
Dim matches As MatchCollection = isbnRegEx.Matches(builder.ToString())
If matches.Count <> 0 Then
For Each i As Match In matches
isbns.Add(i.Groups("isbn").Value)
Next
builder.Clear()
End If
End If
End While
End Using
As you can see, in that example, as soon as a match is found, it adds it to the list and then clears out the builder
which is being used as a buffer. That way, the amount of data being held in memory at one time is never more than the size of one "record".
Since, based on your comments, you're having trouble getting it to work properly, here is a full working sample which outputs just the ISBN numbers, without any of the surrounding characters. Just create a new VB.NET console application and paste in the following code:
Imports System.Text.RegularExpressions
Module Module1
Public Sub Main()
Dim data As String = "'Author 1. Name of book 1. ISBN 978-80-251-2025-5.', 'Author 2. Name of Book 2. ISBN 80-01-01346.', 'Author 3. Name of book. ISBN 80-85849-83.'"
For Each i As String In GetIsbns(data)
Console.WriteLine(i)
Next
Console.ReadKey()
End Sub
Public Function GetIsbns(data As String) As List(Of String)
Dim isbns As New List(Of String)()
For Each i As Match In Regex.Matches(data, "ISBN (?<isbn>.*?)\.")
isbns.Add(i.Groups("isbn").Value)
Next
Return isbns
End Function
End Module
Upvotes: 3
Reputation: 507
Here is my solution
Private Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load
Dim outputtext As New String("")
Dim test As Integer = 0
Dim getLiteratura As String = "'Author 1. Name of book 1. ISBN 978-80-251-2025-5.', 'Author 2. Name of Book 2. ISBN 80-01-01346.', 'Author 3. Name of book. ISBN 80-85849-83.'"
test = getLiteratura.IndexOf("ISBN")
Dim getISBN As String = ""
While Not getLiteratura.Substring(test + 5, getLiteratura.IndexOf(".", test + 1) - test - 5).Length = 0
outputtext = outputtext & getLiteratura.Substring(test + 5, getLiteratura.IndexOf(".", test + 1) - test - 5) & " : "
If getLiteratura.Substring(test + 1).IndexOf("ISBN") = 0 Then
Exit While
Else
test = test + getLiteratura.Substring(test + 1).IndexOf("ISBN")
End If
End While
Label1.Text = outputtext
End Sub
Upvotes: 0
Reputation: 19574
When dealing with a large group of data, I would suggest Regular Expressions.
Try something like this:
Dim getLiteratura As String = "'Author 1. Name of book 1. ISBN 978-80-251-2025-5.', 'Author 2. Name of Book 2. ISBN 80-01-01346.', 'Author 3. Name of book. ISBN 80-85849-83.'"
Dim Pattern As String = "ISBN (.*?)\."
Dim ReturnedMatches As MatchCollection = Regex.Matches(getLiteratura, Pattern)
For Each ReturnedMatch As Match In ReturnedMatches
MsgBox(ReturnedMatch.Groups(1).ToString)
Next
AND, at the top of your module, include the line Imports System.Text.RegularExpressions
Hope this points you in the right direction...
Upvotes: 0