Reputation: 43
I'm trying to build a logparser but i'm stuck. Right now my program goes trough multiple file in a directory and read all the file line by line. I was able to identify the substring i was looking for "fct=" and extract the value next to the "=" using delimiter but i notice that when i have a line with more then one "fct=" it doesnt see it.
So i restart my code and i find a way to get the index position of all occurence of fct= in the same line using an extension method that put the index in a list but i dont see how i can use this list to get the value next to the "=" and using my delimiter.
How can i extract the value next to the "=" knowing the start position of "fct=" and the delimiter at the end of the wanted value?
I'm starting in C# so let me know if i can give you more information. Thanks,
Here's an example of what i would like to parse:
<dat>FCT=10019,XN=KEY,CN=ROHWEPJQSKAUMDUC FCT=666</dat></logurl>
<dat>XN=KEY,CN=RTU FCT=4515</dat></logurl>
<dat>XN=KEY,CN=RT</dat></logurl>
I would like t retrieve 10019,666 and 4515.
namespace LogParserV1
{
class Program
{
static void Main(string[] args)
{
int counter = 0;
string[] dirs = Directory.GetFiles(@"C:/LogParser/LogParserV1", "*.txt");
string fctnumber;
char[] enddelimiter = { '<', ',', '&', ':', ' ', '\\', '\'' };
foreach (string fileName in dirs)
{
StreamReader sr = new StreamReader(fileName);
{
String lineRead;
while ((lineRead = sr.ReadLine()) != null)
{
if (lineRead.Contains("fct="))
{
List<int> list = MyExtensions.GetPositions(lineRead, "fct");
//int start = lineRead.IndexOf("fct=") + 4;
// int end = lineRead.IndexOfAny(enddelimiter, start);
//string result = lineRead.Substring(start, end - start);
fctnumber = result;
//System.Console.WriteLine(fctnumber);
list.ForEach(Console.WriteLine);
}
// affiche tout les ligne System.Console.WriteLine(lineRead);
counter++;
}
System.Console.WriteLine(fileName);
sr.Close();
}
}
// Suspend the screen.
System.Console.ReadLine();
}
}
}
namespace ExtensionMethods
{
public class MyExtensions
{
public static List<int> GetPositions(string source, string searchString)
{
List<int> ret = new List<int>();
int len = searchString.Length;
int start = -len;
while (true)
{
start = source.IndexOf(searchString, start + len);
if (start == -1)
{
break;
}
else
{
ret.Add(start);
}
}
return ret;
}
}
}
Upvotes: 0
Views: 544
Reputation: 1440
You can split the line by string[]
char[] enddelimiter = { '<', ',', '&', ':', ' ', '\\', '\'' };
while ((lineRead = sr.ReadLine()) != null)
{
string[] parts1 = lineRead.Split(new string[] { "fct=" },StringSplitOptions.None);
if(parts1.Length > 0)
{
foreach(string _ar in parts1)
{
if(!string.IsNullOrEmpty(_ar))
{
if(_ar.IndexOfAny(enddelimiter) > 0)
{
MessageBox.Show(_ar.Substring(0, _ar.IndexOfAny(enddelimiter)));
}
else
{
MessageBox.Show(_ar);
}
}
}
}
}
Upvotes: 0
Reputation: 21
Below code is usefull to extract the repeated words with linq in text
string text = "Hi Naresh, How are you. You will be next Super man";
IEnumerable<string> strings = text.Split(' ').ToList();
var result = strings.AsEnumerable().Select(x => new {str = Regex.Replace(x.ToLowerInvariant(), @"[^0-9a-zA-Z]+", ""), count = Regex.Matches(text.ToLowerInvariant(), @"\b" + Regex.Escape(Regex.Replace(x.ToLowerInvariant(), @"[^0-9a-zA-Z]+", "")) + @"\b").Count}).Where(x=>x.count>1).GroupBy(x => x.str).Select(x => x.First());
foreach(var item in result)
{
Console.WriteLine(item.str +" = "+item.count.ToString());
}
Upvotes: 1
Reputation: 346
I have tested this solution with your data, and it gives me the expected results (10019,666 and 4515)
string data = @"<dat>FCT=10019,XN=KEY,CN=ROHWEPJQSKAUMDUC FCT=666</dat></logurl>
<dat>XN=KEY,CN=RTU FCT=4515</dat></logurl>
<dat>XN=KEY,CN=RT</dat></logurl>";
char[] delimiters = { '<', ',', '&', ':', ' ', '\\', '\'' };
Regex regex = new Regex("fct=(.+)", RegexOptions.IgnoreCase);
var values = data.Split(delimiters).Select(x => regex.Match(x).Groups[1].Value);
values = values.Where(x => !string.IsNullOrWhiteSpace(x));
values.ToList().ForEach(Console.WriteLine);
I hope my solution will be helpful, let me know.
Upvotes: 1
Reputation: 3576
You could simplify your code a lot by using Regex
pattern matching instead.
The following pattern: (?<=FCT=)[0-9]*
will match any group of digits preceded by FCT=
.
This enables us to do the following:
string input = "<dat>FCT=10019,XN=KEY,CN=ROHWEPJQSKAUMDUC FCT=666</dat></logurl>...";
string pattern = "(?<=FCT=)[0-9]*";
var values = Regex.Matches(input, pattern).Cast<Match>().Select(x => x.Value);
Upvotes: 1
Reputation: 7610
Something like :
class Program
{
static void Main(string[] args)
{
char[] enddelimiter = { '<', ',', '&', ':', ' ', '\\', '\'' };
var fct = "fct=";
var lineRead = "fct=value1,useless text fct=vfct=alue2,fct=value3";
var values = new List<string>();
int start = lineRead.IndexOf(fct);
while(start != -1)
{
start += fct.Length;
int end = lineRead.IndexOfAny(enddelimiter, start);
if (end == -1)
end = lineRead.Length;
string result = lineRead.Substring(start, end - start);
values.Add(result);
start = lineRead.IndexOf(fct, end);
}
values.ForEach(Console.WriteLine);
}
}
Upvotes: 0
Reputation: 32770
As always, break down the porblem into smaller bits. See if the following methods help in any way. Tying it up to your code is left as an excercise.
private const string Prefix = "fct=";
//make delimiter look up fast
private static HashSet<char> endDelimiters =
new HashSet<char>(new [] { '<', ',', '&', ':', ' ', '\\', '\'' });
private static string[] GetAllFctFields(string line) =>
line.Split(new string[] { Prefix });
private static bool TryGetValue(string delimitedString, out string value)
{
var buffer = new StringBuilder(delimitedString.Length);
foreach (var c in delimitedString)
{
if (endDelimiters.Contains(c))
break;
buffer.Append(c);
}
//I'm assuming that no end delimiter is a format error.
//Modify according to requirements
if (buffer.Length == delimitedString.Length)
{
value = null;
return false;
}
value = buffer.ToString();
return true;
}
Upvotes: 0