ET_Sharker
ET_Sharker

Reputation: 33

C#: How to filter/get value of the specific column in html-table with Regex. Get value between two strings

I want to filter these items that contain "941d3c8a-8d5d-42aa-943e-a07ccaba1629" and get the value inside the of each filtered item. For the first filtered item, the value is "  " and for the second the value is "ROBO7 ".

I used regex in C# like below,but it seemed it can't get the correct value. Any help is appreciated.

    Regex regex = new Regex(
      "(<TD[^>]*?>^941d3c8a-8d5d-42aa-943e-a07ccaba1629$<\\/TD>)",
    RegexOptions.IgnoreCase
    | RegexOptions.CultureInvariant
    | RegexOptions.IgnorePatternWhitespace
    | RegexOptions.Compiled
    );
MatchCollection ms = regex.Matches(InputText);
int index = 1;
foreach (Match match in ms)
{
 if (match.Groups[2].Value.StartsWith(Search))
   return index;
  index++;
}
return 0;

enter image description here

<TABLE class="mainlayout fixedlayouttable" cellSpacing=0 cellPadding=2 width=1223 summary=* border=0>
<TBODY>
<TR openspan_RowSchemaId="efbd02b8-fb2f-4f1c-8091-2d82e9bd0220">
<TD width=82 noWrap openspan_CellSchemaId="53174ff2-e55c-472b-ae59-6d430afb3f10"><A onclick="doScanNo('G1S1990706')" href="#">G1S1990706</A> </TD>
<TD width=95 openspan_CellSchemaId="4be80351-39e7-4f8a-be50-9233436a3266">G1:一般支払 </TD>
<TD width=100 openspan_CellSchemaId="6e5e1343-ef24-4e55-b2a6-9d7c422400c8">R1:QRなし </TD>
<TD width=84 noWrap openspan_CellSchemaId="99bb881d-2870-4742-9ebc-689337842cd5">&nbsp; </TD>
<TD width=160 noWrap openspan_CellSchemaId="26265b76-713e-4543-9bd5-334416e8b0df"><SPAN class=nowraptext><BR></SPAN></TD>
<TD width=186 noWrap openspan_CellSchemaId="785a8e77-1803-4123-94d4-f878f372b86d"><SPAN class=nowraptext><BR></SPAN></TD>
<TD width=78 noWrap openspan_CellSchemaId="83fd103d-ee4b-45e8-adb1-7df59fff170c">RobotAgent1 </TD>
<TD class=same width=79 noWrap openspan_CellSchemaId="7016e294-b4f4-433f-8711-d798e0b90cf1">2017/09/20 </TD>
<TD width=78 noWrap openspan_CellSchemaId="6c26eb0d-0195-45b8-a07b-5b0ac26c636d">ROBO7 </TD>
<TD width=78 noWrap openspan_CellSchemaId="a7ebc00f-32ad-4fcf-b991-81983587a91d">&nbsp; </TD>
<TD class=same width=79 noWrap openspan_CellSchemaId="6e9ce592-309b-4288-9e96-09c3f3a4374d">2017/11/07 </TD>
<TD width=78 noWrap openspan_CellSchemaId="941d3c8a-8d5d-42aa-943e-a07ccaba1629">&nbsp; </TD></TR>
<TR openspan_RowSchemaId="efbd02b8-fb2f-4f1c-8091-2d82e9bd0220">
<TD width=82 noWrap openspan_CellSchemaId="53174ff2-e55c-472b-ae59-6d430afb3f10"><A onclick="doScanNo('G1S1990716')" href="#">G1S1990716</A> </TD>
<TD width=95 openspan_CellSchemaId="4be80351-39e7-4f8a-be50-9233436a3266">G1:一般支払 </TD>
<TD width=100 openspan_CellSchemaId="6e5e1343-ef24-4e55-b2a6-9d7c422400c8">01:スキャン済 </TD>
<TD width=84 noWrap openspan_CellSchemaId="99bb881d-2870-4742-9ebc-689337842cd5">&nbsp; </TD>
<TD width=160 noWrap openspan_CellSchemaId="26265b76-713e-4543-9bd5-334416e8b0df"><SPAN class=nowraptext><BR></SPAN></TD>
<TD width=186 noWrap openspan_CellSchemaId="785a8e77-1803-4123-94d4-f878f372b86d"><SPAN class=nowraptext><BR></SPAN></TD>
<TD width=78 noWrap openspan_CellSchemaId="83fd103d-ee4b-45e8-adb1-7df59fff170c">RobotAgent2</TD>
<TD class=same width=79 noWrap openspan_CellSchemaId="7016e294-b4f4-433f-8711-d798e0b90cf1">2017/09/20 </TD>
<TD width=78 noWrap openspan_CellSchemaId="6c26eb0d-0195-45b8-a07b-5b0ac26c636d">ROBO7 </TD>
<TD width=78 noWrap openspan_CellSchemaId="a7ebc00f-32ad-4fcf-b991-81983587a91d">&nbsp; </TD>
<TD class=same width=79 noWrap openspan_CellSchemaId="6e9ce592-309b-4288-9e96-09c3f3a4374d">2017/11/13 </TD>
<TD width=78 noWrap openspan_CellSchemaId="941d3c8a-8d5d-42aa-943e-a07ccaba1629">ROBO7 </TD></TR>
</TBODY>
</TABLE>

Upvotes: 0

Views: 241

Answers (1)

oetoni
oetoni

Reputation: 3877

The regex you need for that is this

(?<=941d3c8a-8d5d-42aa-943e-a07ccaba1629">)(.*)(?=<\/TD>)

I will post you C# solution in a bit ;) (updating)

if you need to test the Regex do it here to see validity of conditions. It matches everything in between two strings as per your condition.

here is the simple code example (I'm just printing but you can do whatever you like with it)

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.IO;
using System.Text.RegularExpressions;

namespace test3
{
    class Program
    {

        static void Main(string[] args)
        {

            string createText = System.IO.File.ReadAllText(@"C:\PATH_TO_THE FILE_I_USED\data.txt");

            Regex regex = new Regex("(?<=941d3c8a-8d5d-42aa-943e-a07ccaba1629\" >)(.*)(?=<\\/ TD >)",
                RegexOptions.IgnoreCase
                | RegexOptions.CultureInvariant
                | RegexOptions.IgnorePatternWhitespace
                | RegexOptions.Compiled
                );

            MatchCollection ms = regex.Matches(createText);

            foreach (Match match in ms)
            {
                Console.WriteLine(match);
            }

            Console.ReadLine();
        }

    }
}

Upvotes: 2

Related Questions