Reputation: 391
I have multiple strings that I am importing from a file. The format of the string is like this:
Smith, Tom 1/2/62 45484
[Last Name], [First Name] [Date] [Number]
I need a way to break these apart into four variables.
Dim first_name as string = first name
, etc.
I thought I could maybe use regex but I keep hitting a wall with it.
Thanks for any help!
Upvotes: 0
Views: 289
Reputation: 626691
You can also use a different approach using a regex and LINQ:
Dim person As String = "Smith, Tom 1/2/62 45484"
Dim rxPerson As Regex = New Regex("(?<lastname>[\p{L}\s]+),\s+(?<firstname>[\p{L}\s]+)\s+(?<date>[\d/]+)\s+(?<id>\d+)")
Dim matches_prs As IEnumerable(Of Match) = rxPerson.Matches(person).Cast(Of Match)().Select(Function(m) m)
Dim result = (From match In matches_prs
Select New With {.lastname = match.Groups("lastname").Value,
.firstname = match.Groups("firstname").Value,
.date = match.Groups("date").Value,
.id = match.Groups("id").Value}).ToList()
Result:
The regex matches:
(?<lastname>[\p{L}\s]+)
- The last name that only consists of Unicode letters and spaces,\s+
- a comma and 1 or more whitespace separating last name and first name(?<firstname>[\p{L}\s]+)
- The first name that only consists of Unicode letters and spaces\s+
- 1 or more whitespace separating the name and the date(?<date>[\d/]+)
- a date element\s+
- 1 or more whitespace separating the date and the id(?<id>\d+)
- the id
that only consists of numbers.Upvotes: 0
Reputation: 43743
Yes, RegEx is a great option for this. Here's how you could do it in VB:
Dim input As String = "Smith, Tom 1/2/62 45484"
Dim pattern As String = "(?<last>.*?), (?<first>.*?) (?<date>\S+) (?<number>\d+)"
For Each m As Match In Regex.Matches(input, pattern)
Dim last As String = m.Groups("last").Value
Dim first As String = m.Groups("first").Value
Dim [date] As String = m.Groups("date").Value
Dim number As String = m.Groups("number").Value
Next
You may need to adjust the pattern to match your needs. Here's the meaning of the pattern as I demonstrated it:
(?<last>.*?)
- Captures the last name portion of the string
(
- Begins a capturing group?<last>
- Gives a name to the capturing group.
- Any character*?
- Any number of times (any length of characters), non-greedy. Placing the ?
at after the *
is what makes it non-greedy. Non-greedy just means that it will capture as little of the string as possible (i.e. only until the first comma rather than until the last comma))
- Ends the capturing group,
- There must be a comma followed by a space between the first and last names(?<first>.*?)
- Captures the first name. .*?
captures any length of any characters, non-greedy.
- There must be a single space between the first name and the date(?<date>\S+)
- Captures the date. \S+
captures one or more non-whitespace characters.
- There must be a single space between the date and the number(?<number>\d+)
- Captures the number. \d+
captures one or more digit characters.I used named groups so that the code is more clear and readable. You could alternatively just use numbered groups and read them via index (e.g. m.Groups(0).Value
).
Also, I used a loop to look through all of the results from Matches
. However, if you are only going to give RegEx one line at a time, or something like that, where the input can only contain a single match, then you could use the Match
method instead, which is a little easier:
Dim m As Match = Regex.Match(input, pattern)
If m.Success Then
Dim last As String = m.Groups("last").Value
Dim first As String = m.Groups("first").Value
Dim [date] As String = m.Groups("date").Value
Dim number As String = m.Groups("number").Value
End If
Upvotes: 5