Bryant Frankford
Bryant Frankford

Reputation: 391

Break apart string in visual basic

I have multiple strings that I am importing from a file. The format of the string is like this:

Smith, Tom 1/2/62 45484

[Last Name], [First Name] [Date] [Number]

I need a way to break these apart into four variables.

Dim first_name as string = first name, etc.

I thought I could maybe use regex but I keep hitting a wall with it.

Thanks for any help!

Upvotes: 0

Views: 289

Answers (2)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626691

You can also use a different approach using a regex and LINQ:

Dim person As String = "Smith, Tom 1/2/62 45484"
Dim rxPerson As Regex = New Regex("(?<lastname>[\p{L}\s]+),\s+(?<firstname>[\p{L}\s]+)\s+(?<date>[\d/]+)\s+(?<id>\d+)")
Dim matches_prs As IEnumerable(Of Match) = rxPerson.Matches(person).Cast(Of Match)().Select(Function(m) m)
Dim result = (From match In matches_prs
           Select New With {.lastname = match.Groups("lastname").Value,
                        .firstname = match.Groups("firstname").Value,
                        .date = match.Groups("date").Value,
                        .id = match.Groups("id").Value}).ToList()

Result:

enter image description here

The regex matches:

  • (?<lastname>[\p{L}\s]+) - The last name that only consists of Unicode letters and spaces
  • ,\s+ - a comma and 1 or more whitespace separating last name and first name
  • (?<firstname>[\p{L}\s]+) - The first name that only consists of Unicode letters and spaces
  • \s+ - 1 or more whitespace separating the name and the date
  • (?<date>[\d/]+) - a date element
  • \s+ - 1 or more whitespace separating the date and the id
  • (?<id>\d+) - the id that only consists of numbers.

Upvotes: 0

Steven Doggart
Steven Doggart

Reputation: 43743

Yes, RegEx is a great option for this. Here's how you could do it in VB:

Dim input As String = "Smith, Tom 1/2/62 45484"
Dim pattern As String = "(?<last>.*?), (?<first>.*?) (?<date>\S+) (?<number>\d+)"
For Each m As Match In Regex.Matches(input, pattern)
    Dim last As String = m.Groups("last").Value
    Dim first As String = m.Groups("first").Value
    Dim [date] As String = m.Groups("date").Value
    Dim number As String = m.Groups("number").Value
Next

You may need to adjust the pattern to match your needs. Here's the meaning of the pattern as I demonstrated it:

  • (?<last>.*?) - Captures the last name portion of the string
    • ( - Begins a capturing group
    • ?<last> - Gives a name to the capturing group
    • . - Any character
    • *? - Any number of times (any length of characters), non-greedy. Placing the ? at after the * is what makes it non-greedy. Non-greedy just means that it will capture as little of the string as possible (i.e. only until the first comma rather than until the last comma)
    • ) - Ends the capturing group
  • , - There must be a comma followed by a space between the first and last names
  • (?<first>.*?) - Captures the first name. .*? captures any length of any characters, non-greedy.
  • - There must be a single space between the first name and the date
  • (?<date>\S+) - Captures the date. \S+ captures one or more non-whitespace characters.
  • - There must be a single space between the date and the number
  • (?<number>\d+) - Captures the number. \d+ captures one or more digit characters.

I used named groups so that the code is more clear and readable. You could alternatively just use numbered groups and read them via index (e.g. m.Groups(0).Value).

Also, I used a loop to look through all of the results from Matches. However, if you are only going to give RegEx one line at a time, or something like that, where the input can only contain a single match, then you could use the Match method instead, which is a little easier:

Dim m As Match = Regex.Match(input, pattern)
If m.Success Then
    Dim last As String = m.Groups("last").Value
    Dim first As String = m.Groups("first").Value
    Dim [date] As String = m.Groups("date").Value
    Dim number As String = m.Groups("number").Value
End If

Upvotes: 5

Related Questions