Regex to extract info out of large html source?

Question

in among lots of html source i have some elements like this

Also I have a dictionary declared like Dictionary Places = new Dictionary();

What I want to do it extract the City name out of the html and put it into of Places, and extract the number code out and put it into the int. For the first one I would add Placed.Add("Manama", 15); The country name can get ignored. The idea though is to scan the html source and add the Cities automatically.

this is what I have so far

string[] temp = htmlContent.Split('
');
List temp2 = new List();
foreach (string s in temp)
{
    if (s.Contains("



This cuts out some of the text but then I more or less get stuck wondering how to extract the relevant parts from the text. It's really bad I know but I'm learning :(

Ryan Durrant · Accepted Answer

If the only relevant data you are looking for is within

string[] options = Regex.Split(theSource, "

to get the number you can loop the if statement with a pointer if you need to get longer numbers. If the numbers are not always over 10, just loop the if statement with a pointer and ignore the first lines.

Then I would re-use the string theString:

string[] place = Regex.Split(options[x], " - "); // split it immediately after the name
theString = place[0].substring(y, place[0].length - y);

And then add them with

Places.Add(theString, theInt);

Shoud work, if the code doesnt work straigth away, the algorithms will, just make sure the spelling is right and that the variables are doing what they should

Regex to extract info out of large html source?

Answers (2)

Related Questions