JasperMW
JasperMW

Reputation: 533

IndexOf behaves different than expected C#

I've got the following three lines of code, with html being an html page stored as a string.

int startIndex = html.IndexOf("<title>") + 8; // <title> plus a space equals 8 characters
int endIndex = html.IndexOf("</title>") - 18; // -18 is because of the input, there are 18 extra characters after the username.
result = new Tuple<string, bool>(html.Substring(startIndex, endIndex), false);

With the input <title>Username012345678912141618</title> I would expect an output of Username. However, the code can't find the </title>. I'm not sure what's going wrong. Does anyone know what could cause this behaviour? I've tested it with three different webpages (all from the same site), of which I inspected the content.

Upvotes: 1

Views: 137

Answers (2)

Barns
Barns

Reputation: 4848

I realize the OP was inquiring about the IndexOf method, but here is a solution that uses a different approach--Regular Expressions, which are perfectly suited "surgically" extract data from strings.

The following pattern is all that is needed to extract the "Username" from the html tag:

var pattern = $@"<title>Username(.+)</title>";

This pattern would be used as follows:

var pattern = $@"<title>Username(.+)</title>";
var ms = Regex.Match(html, pattern, RegexOptions.IgnoreCase);
var userName = ms.Groups.Count > 0 ? ms.Groups[1].Value : string.Empty;

One advantage of Regex is that you can use the exact text that you are using to search for the data you need. No need to fumble around with adding or subtracting "places" from the index.

You will need to add:

using System.Text.RegularExpressions;

to the class you intend to implement Regex.

Upvotes: 0

Guru Stron
Guru Stron

Reputation: 141565

String.Substring with 2 parameters has next signature - String.Substring(int startIndex, int length) with second parameter being the number of characters in the substring. So you need to do something like this (taking in account your comment):

int startIndex = html.IndexOf("<title>") + 8;
int endIndex = html.IndexOf("</title>")
var result = new Tuple<string, bool>(html.Substring(startIndex, endIndex - startIndex - 18), false);

Upvotes: 3

Related Questions