Reputation: 8337
I have a string like below.
Fax : 666-111-2222 Phone # : 200100200
I want to find the phone number. But the problem is that, the number of spaces after Phone and after # can vary in different strings to extract the data from. Also, writing complex function is not recommended as I have a large dataset to extract data from.
I tried the below code and it gives me the correct starting index with n number of spaces. But I can not find the position after : from that
System.Globalization.CultureInfo.InvariantCulture.CompareInfo.IndexOf(FullString,"Phone#:",System.Globalization.CompareOptions.IgnoreSymbols)
Upvotes: 0
Views: 287
Reputation: 5629
This is clearly a job for regular expressions.
String toMatch = "Fax : 666-111-2222 Phone # : 200100200";
Regex matchPhone = new Regex("\\bPhone\\s*#\\s*:\\s*");
MatchCollection matches = matchPhone.Matches(toMatch);
foreach (Match match in matches)
{
Int32 position = match.Index + match.Length;
// do whatever you want with the result here
}
In code, the backslashes are doubled, but the actual regex in there is:
\bPhone\s*#\s*:\s*
\b
indicates a word boundary, meaning, the start or end of a word. This prevents something like "MegaPhone" from matching as well.\s
means any type of whitespace. This matches spaces, tabs, and line breaks.*
means zero or more repetitions, meaning, it doesn't matter if the whitespace is not there at all, or is a hundred spaces long, it will match nonetheless.Note that this will only give you the index for the start of all found phone numbers in the given string. You didn't specify if there was any specific way to detect the end of a phone number, or even if there was any specific expected format for them, so that's not included. If you want that, and you don't know exactly what may follow this phone number, look into regex character groups and matching specific numeric content, and use a capture group to extract it from the matched content.
If there is just a single match expected in the whole string, it can just be done with
String toMatch = "Fax : 666-111-2222 Phone # : 200100200";
Regex matchPhone = new Regex("\\bPhone\\s*#\\s*:\\s*");
Match match = matchPhone.Match(toMatch);
Int32 position = match.Index + match.Length;
Upvotes: 1
Reputation: 268
I think you should you regex:
Regex rxPhone = new Regex(@"Phone\s*#\s*:\s*(\d+)");
Match match = rxPhone.Match(stringToMatch);
if (match.Success) //if the phone does not always exits
{
string strPhoneNumber = match.Groups[1];
int intPhoneNumber = int.Parse(match.Groups[1]);
int position = match.Groups[1].Index
//just pick the one you need
}
Upvotes: 0
Reputation: 8782
If you can rely on the format, then that's quite straightforward.
Just clean the string of all white spaces (.Replace(" ", string.Empty)
) then split on the characters after which the phone number starts, e.g. "#:":
var phoneFull = @"Fax : 666-111-2222 Phone # : 200100200";
var phone = phoneFull
.Replace(" ", string.Empty)
.Split("#:")
.Last();
Upvotes: 0
Reputation: 15091
You have a space between Phone and #, also between # and :. Substring with a single parameter will return a string from that index to the end of the input string. Trim will remove any whitespace on either side.
Private Function GetPhone(input As String) As String
Dim i = input.IndexOf("Phone")
Dim s = input.Substring(i)
Dim splits = s.Split(":"c)
Return splits(1).Trim
End Function
I ran the Function 10,000 times and it took 5 milliseconds.
Private Sub Button1_Click(sender As Object, e As EventArgs) Handles Button1.Click
Dim s = "Fax: 666-111-2222 Phone # : 200100200"
Dim Phone As String = ""
Dim sw As New Stopwatch
sw.Start()
For i = 0 To 10_000
Phone = GetPhone(s)
Next
sw.Stop()
Debug.Print(sw.ElapsedMilliseconds.ToString)
MessageBox.Show(Phone)
End Sub
Upvotes: 1
Reputation: 6292
I'm assuming you need a C# answer.
I would use regular expressions, but if you insist on using IndexOf
you can do:
string fullString = "Fax : 666-111-2222 Phone # : 200100200";
int phonePos = fullString.IndexOf("Phone");
int hashPos = fullString.IndexOf("#", phonePos+"Phone".Length);
int colonPos = fullString.IndexOf(":", hashPos+1);
That way you have the absolute position of the colon, no matter how many spaces.
Note, that I use String.IndexOf
. There's no reason to dig it out of the CompareInfo as you do.
Also note that I use the overload that takes an extra parameter, which is the start index.
Upvotes: 0