Joel Deslauriers
Joel Deslauriers

Reputation: 43

Extract multiple substring in the same line

I'm trying to build a logparser but i'm stuck. Right now my program goes trough multiple file in a directory and read all the file line by line. I was able to identify the substring i was looking for "fct=" and extract the value next to the "=" using delimiter but i notice that when i have a line with more then one "fct=" it doesnt see it.

So i restart my code and i find a way to get the index position of all occurence of fct= in the same line using an extension method that put the index in a list but i dont see how i can use this list to get the value next to the "=" and using my delimiter.

How can i extract the value next to the "=" knowing the start position of "fct=" and the delimiter at the end of the wanted value?

I'm starting in C# so let me know if i can give you more information. Thanks,

Here's an example of what i would like to parse:

<dat>FCT=10019,XN=KEY,CN=ROHWEPJQSKAUMDUC FCT=666</dat></logurl>
<dat>XN=KEY,CN=RTU FCT=4515</dat></logurl>
<dat>XN=KEY,CN=RT</dat></logurl>

I would like t retrieve 10019,666 and 4515.

namespace LogParserV1
{
class Program
{

    static void Main(string[] args)
    {

        int counter = 0;
        string[] dirs = Directory.GetFiles(@"C:/LogParser/LogParserV1", "*.txt");
        string fctnumber;
        char[] enddelimiter = { '<', ',', '&', ':', ' ', '\\', '\'' };

        foreach (string fileName in dirs)
        {
            StreamReader sr = new StreamReader(fileName);

            {
                String lineRead;
                while ((lineRead = sr.ReadLine()) != null)
                {

                    if (lineRead.Contains("fct="))
                    {
                        List<int> list = MyExtensions.GetPositions(lineRead, "fct");
                        //int start = lineRead.IndexOf("fct=") + 4;
                       // int end = lineRead.IndexOfAny(enddelimiter, start);
                        //string result = lineRead.Substring(start, end - start);

                        fctnumber = result;

                        //System.Console.WriteLine(fctnumber);
                        list.ForEach(Console.WriteLine);
                    }
                    // affiche tout les ligne System.Console.WriteLine(lineRead);
                    counter++;
                }
                System.Console.WriteLine(fileName);

                sr.Close();
            }
        }

        // Suspend the screen.  
        System.Console.ReadLine();

    }
}
}


namespace ExtensionMethods
{
public  class MyExtensions
{
    public static List<int> GetPositions(string source, string searchString)
    {
        List<int> ret = new List<int>();
        int len = searchString.Length;
        int start = -len;
        while (true)
        {
            start = source.IndexOf(searchString, start + len);
            if (start == -1)
            {
                break;
            }
            else
            {
                ret.Add(start);
            }
        }
        return ret;
    }
    }
}

Upvotes: 0

Views: 544

Answers (6)

Colonel Software
Colonel Software

Reputation: 1440

You can split the line by string[]

char[] enddelimiter = { '<', ',', '&', ':', ' ', '\\', '\'' };
while ((lineRead = sr.ReadLine()) != null)
            {
               string[] parts1 = lineRead.Split(new string[] { "fct=" },StringSplitOptions.None);

                if(parts1.Length > 0)
        {
            foreach(string _ar in parts1)
            {
                if(!string.IsNullOrEmpty(_ar))
                {
                    if(_ar.IndexOfAny(enddelimiter) > 0)
                    {
                        MessageBox.Show(_ar.Substring(0, _ar.IndexOfAny(enddelimiter)));
                    }
                    else
                    {
                        MessageBox.Show(_ar);
                    }
                }
            }
        }
     }

Upvotes: 0

Alapati Naresh Kumar
Alapati Naresh Kumar

Reputation: 21

Below code is usefull to extract the repeated words with linq in text

string text = "Hi Naresh, How are you. You will be next Super man";
    IEnumerable<string> strings = text.Split(' ').ToList();
    var result = strings.AsEnumerable().Select(x => new {str = Regex.Replace(x.ToLowerInvariant(), @"[^0-9a-zA-Z]+", ""), count = Regex.Matches(text.ToLowerInvariant(), @"\b" + Regex.Escape(Regex.Replace(x.ToLowerInvariant(), @"[^0-9a-zA-Z]+", "")) + @"\b").Count}).Where(x=>x.count>1).GroupBy(x => x.str).Select(x => x.First());
    foreach(var item in result)
    {
        Console.WriteLine(item.str +" = "+item.count.ToString());
    }

Upvotes: 1

claudiom248
claudiom248

Reputation: 346

I have tested this solution with your data, and it gives me the expected results (10019,666 and 4515)

string data = @"<dat>FCT=10019,XN=KEY,CN=ROHWEPJQSKAUMDUC FCT=666</dat></logurl>
                <dat>XN=KEY,CN=RTU FCT=4515</dat></logurl>
                <dat>XN=KEY,CN=RT</dat></logurl>";

char[] delimiters = { '<', ',', '&', ':', ' ', '\\', '\'' };

Regex regex = new Regex("fct=(.+)", RegexOptions.IgnoreCase);

var values = data.Split(delimiters).Select(x => regex.Match(x).Groups[1].Value);
values = values.Where(x => !string.IsNullOrWhiteSpace(x));

values.ToList().ForEach(Console.WriteLine);  

I hope my solution will be helpful, let me know.

Upvotes: 1

Innat3
Innat3

Reputation: 3576

You could simplify your code a lot by using Regex pattern matching instead.

The following pattern: (?<=FCT=)[0-9]* will match any group of digits preceded by FCT=.

Try it out

This enables us to do the following:

string input = "<dat>FCT=10019,XN=KEY,CN=ROHWEPJQSKAUMDUC FCT=666</dat></logurl>...";
string pattern = "(?<=FCT=)[0-9]*";
var values = Regex.Matches(input, pattern).Cast<Match>().Select(x => x.Value);

Upvotes: 1

vernou
vernou

Reputation: 7610

Something like :

class Program
{
    static void Main(string[] args)
    {
        char[] enddelimiter = { '<', ',', '&', ':', ' ', '\\', '\'' };
        var fct = "fct=";

        var lineRead = "fct=value1,useless text fct=vfct=alue2,fct=value3";

        var values = new List<string>();
        int start = lineRead.IndexOf(fct);
        while(start != -1)
        {
            start += fct.Length;
            int end = lineRead.IndexOfAny(enddelimiter, start);
            if (end == -1)
                end = lineRead.Length;
            string result = lineRead.Substring(start, end - start);
            values.Add(result);
            start = lineRead.IndexOf(fct, end);
        }
        values.ForEach(Console.WriteLine);
    }
}

Upvotes: 0

InBetween
InBetween

Reputation: 32770

As always, break down the porblem into smaller bits. See if the following methods help in any way. Tying it up to your code is left as an excercise.

private const string Prefix = "fct=";

//make delimiter look up fast
private static HashSet<char> endDelimiters = 
    new HashSet<char>(new [] { '<', ',', '&', ':', ' ', '\\', '\'' });

private static string[] GetAllFctFields(string line) =>
    line.Split(new string[] { Prefix });

private static bool TryGetValue(string delimitedString, out string value)
{
    var buffer = new StringBuilder(delimitedString.Length);

    foreach (var c in delimitedString)
    {
        if (endDelimiters.Contains(c)) 
            break;

        buffer.Append(c);
    }

    //I'm assuming that no end delimiter is a format error.
    //Modify according to requirements
    if (buffer.Length == delimitedString.Length) 
    {
        value = null;
        return false;
    }

    value = buffer.ToString();
    return true;
}

Upvotes: 0

Related Questions