jenik2205
jenik2205

Reputation: 475

How to find all occurrences of specific string in long text

I have some long text (e.g. information about many books) in one string and in one line.

I want to find just ISBN (only number - each number prevents by chars ISBN). I found code how to extract this number on first position. The problem is how to create loop for all text. Can I use it for this example streamreader? Thank you for your answers.

Example:

Sub Main()
    Dim getLiteratura As String = "'Author 1. Name of book 1. ISBN 978-80-251-2025-5.', 'Author 2. Name of Book 2. ISBN 80-01-01346.', 'Author 3. Name of book. ISBN 80-85849-83.'"
    Dim test As Integer = getLiteratura.IndexOf("ISBN")
    Dim getISBN As String = getLiteratura.Substring(test + 5, getLiteratura.IndexOf(".", test + 1) - test - 5)

    Console.Write(getISBN)
    Console.ReadKey()
End Sub

Upvotes: 3

Views: 15717

Answers (3)

Steven Doggart
Steven Doggart

Reputation: 43743

Since you can pass the start position into the IndexOf method, you can loop through the string by starting the search from where the last iteration left off. For instance:

Dim getLiteratura As String = "'Author 1. Name of book 1. ISBN 978-80-251-2025-5.', 'Author 2. Name of Book 2. ISBN 80-01-01346.', 'Author 3. Name of book. ISBN 80-85849-83.'"
Dim isbns As New List(Of String)()
Dim position As Integer = 0
While position <> -1
    position = getLiteratura.IndexOf("ISBN", position)
    If position <> -1 Then
        Dim endPosition As Integer = getLiteratura.IndexOf(".", position + 1)
        If endPosition <> -1 Then
            isbns.Add(getLiteratura.Substring(position + 5, endPosition - position - 5))
        End If
        position = endPosition
    End If
End While

That would be about as efficient of a method as you are likely to find, if the data is already all loaded into a string. However, that method is not very readable or flexible. If those things concern you more than mere efficiency, you may want to consider using RegEx:

For Each i As Match In Regex.Matches(getLiteratura, "ISBN (?<isbn>.*?)\.")
    isbns.Add(i.Groups("isbn").Value)
Next

As you can see, not only is it much easier to read, it is also configurable. You could store the pattern externally in a resource, configuration file, database, etc.

If the data isn't already all loaded into a string, and efficiency is an utmost concern, you may want to look into using a stream reader so that you only load a small subset of the data into memory at once. That logic would be a bit more complicated, but still not overly difficult.

Here's a simple example of how you could do it via a StreamReader:

Dim isbns As New List(Of String)()
Using reader As StreamReader = New StreamReader(stream)
    Dim builder As New StringBuilder()
    Dim isbnRegEx As New Regex("ISBN (?<isbn>.*?)\.")
    While Not reader.EndOfStream
        Dim charValue As Integer = reader.Read()
        If charValue <> -1 Then
            builder.Append(Convert.ToChar(charValue))
            Dim matches As MatchCollection = isbnRegEx.Matches(builder.ToString())
            If matches.Count <> 0 Then
                For Each i As Match In matches
                    isbns.Add(i.Groups("isbn").Value)
                Next
                builder.Clear()
            End If
        End If
    End While
End Using

As you can see, in that example, as soon as a match is found, it adds it to the list and then clears out the builder which is being used as a buffer. That way, the amount of data being held in memory at one time is never more than the size of one "record".

UPDATE

Since, based on your comments, you're having trouble getting it to work properly, here is a full working sample which outputs just the ISBN numbers, without any of the surrounding characters. Just create a new VB.NET console application and paste in the following code:

Imports System.Text.RegularExpressions

Module Module1
    Public Sub Main()
        Dim data As String = "'Author 1. Name of book 1. ISBN 978-80-251-2025-5.', 'Author 2. Name of Book 2. ISBN 80-01-01346.', 'Author 3. Name of book. ISBN 80-85849-83.'"
        For Each i As String In GetIsbns(data)
            Console.WriteLine(i)
        Next
        Console.ReadKey()
    End Sub

    Public Function GetIsbns(data As String) As List(Of String)
        Dim isbns As New List(Of String)()
        For Each i As Match In Regex.Matches(data, "ISBN (?<isbn>.*?)\.")
            isbns.Add(i.Groups("isbn").Value)
        Next
        Return isbns
    End Function
End Module

Upvotes: 3

Derek Meyer
Derek Meyer

Reputation: 507

Here is my solution

Private Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load
    Dim outputtext As New String("")
    Dim test As Integer = 0
    Dim getLiteratura As String = "'Author 1. Name of book 1. ISBN 978-80-251-2025-5.', 'Author 2. Name of Book 2. ISBN 80-01-01346.', 'Author 3. Name of book. ISBN 80-85849-83.'"
    test = getLiteratura.IndexOf("ISBN")
    Dim getISBN As String = ""
    While Not getLiteratura.Substring(test + 5, getLiteratura.IndexOf(".", test + 1) - test - 5).Length = 0
        outputtext = outputtext & getLiteratura.Substring(test + 5, getLiteratura.IndexOf(".", test + 1) - test - 5) & " : "
        If getLiteratura.Substring(test + 1).IndexOf("ISBN") = 0 Then
            Exit While
        Else
            test = test + getLiteratura.Substring(test + 1).IndexOf("ISBN")
        End If
    End While

    Label1.Text = outputtext
End Sub

Upvotes: 0

John Bustos
John Bustos

Reputation: 19574

When dealing with a large group of data, I would suggest Regular Expressions.

Try something like this:

    Dim getLiteratura As String = "'Author 1. Name of book 1. ISBN 978-80-251-2025-5.', 'Author 2. Name of Book 2. ISBN 80-01-01346.', 'Author 3. Name of book. ISBN 80-85849-83.'"
    Dim Pattern As String = "ISBN (.*?)\."
    Dim ReturnedMatches As MatchCollection = Regex.Matches(getLiteratura, Pattern)
    For Each ReturnedMatch As Match In ReturnedMatches
        MsgBox(ReturnedMatch.Groups(1).ToString)
    Next

AND, at the top of your module, include the line Imports System.Text.RegularExpressions

Hope this points you in the right direction...

Upvotes: 0

Related Questions