Reputation:
Trying to read data from a text file using a C# application. There're multiple lines of data and each of them start with an integer and then followed by bunch of double values. A part of the text file looks like this,
33 0.573140941467E-01 0.112914262390E-03 0.255553577735E-02 0.497192659486E-04 0.141869181079E-01-0.147813598922E-03
34 0.570076593453E-01 0.100112550891E-03 0.256427138318E-02-0.868691490164E-05 0.142821920093E-01-0.346011975369E-03
35 0.715507714946E-01 0.316132133031E-03-0.106581466521E-01-0.920513736900E-04 0.138018668842E-01-0.212219497066E-03
Here 33, 34, 35 are integer values and it's followed by 6 double values. And these double values are not guaranteed to have space or some other delimiter between them. i.e., if a double is negative then it will have a "-" before it and this will take up the space. So basically, it's possible that all 6 double values will be together.
Now the challenge is, how to extract this gracefully?
What I tried:
String.Split(' ');
This will not work as a space is not guaranteed between the initial integer values and then the rest of double values.
This can be easily solved in C++ using sscanf
.
double a, b, c, d, e, f;
sscanf(string, "%d %lf%lf%lf%lf%lf%lf", &a, &b, &c, &d, &e, &f);
// here string contains a line of data from text file.
The text file containing double values are generated by a 3rd party tool and I have no control over its output.
Is there a way the integer and double values can be gracefully extracted line by line?
Upvotes: 16
Views: 2598
Reputation: 23797
If I am seeing that right, you have a "Fixed Width Data" format. Than you can simply parse on that fact.
i.e. assuming the values are in a file d:\temp\doubles.txt
:
void Main()
{
var filename = @"d:\temp\doubles.txt";
Func<string, string[]> split = (s) =>
{
string[] res = new string[7];
res[0] = s.Substring(0, 2);
for (int i = 0; i < 6; i++)
{
res[i + 1] = s.Substring(2 + (i * 19), 19);
}
return res;
};
var result = from l in File.ReadAllLines(filename)
let la = split(l)
select new
{
i = int.Parse(la[0]),
d1 = double.Parse(la[1]),
d2 = double.Parse(la[2]),
d3 = double.Parse(la[3]),
d4 = double.Parse(la[4]),
d5 = double.Parse(la[5]),
d6 = double.Parse(la[6])
};
foreach (var e in result)
{
Console.WriteLine($"{e.i}, {e.d1}, {e.d2}, {e.d3}, {e.d4}, {e.d5}, {e.d6}");
}
}
Outputs:
33, 0.0573140941467, 0.00011291426239, 0.00255553577735, 4.97192659486E-05, 0.0141869181079, -0.000147813598922
34, 0.0570076593453, 0.000100112550891, 0.00256427138318, -8.68691490164E-06, 0.0142821920093, -0.000346011975369
35, 0.0715507714946, 0.000316132133031, -0.0106581466521, -9.205137369E-05, 0.0138018668842, -0.000212219497066
PS: With your exact data, int
should be allocating more space.
Upvotes: 10
Reputation: 178
Yet another solution, processing each line by itself and including the int value:
static void Main(string[] args) {
string[] fileLines = {
"33 0.573140941467E-01 0.112914262390E-03 0.255553577735E-02 0.497192659486E-04 0.141869181079E-01-0.147813598922E-03",
"34 0.570076593453E-01 0.100112550891E-03 0.256427138318E-02-0.868691490164E-05 0.142821920093E-01-0.346011975369E-03",
"35 0.715507714946E-01 0.316132133031E-03-0.106581466521E-01-0.920513736900E-04 0.138018668842E-01-0.212219497066E-03"
};
var rex = new Regex(@"\b([-+]?\d+(?:\.\d+(?:E[+-]\d+)?)?)\b", RegexOptions.Compiled);
foreach (var line in fileLines) {
var dblValues = new List<double>();
foreach (Match match in rex.Matches(line)) {
string strVal = match.Groups[1].Value;
double number = Double.Parse(strVal, NumberFormatInfo.InvariantInfo);
dblValues.Add(number);
}
Console.WriteLine(string.Join("; ", dblValues));
}
Console.ReadLine();
}
}
The result/output is:
33; 0,0573140941467; 0,00011291426239; 0,00255553577735; 4,97192659486E-05; 0,0141869181079; -0,000147813598922
34; 0,0570076593453; 0,000100112550891; 0,00256427138318; -8,68691490164E-06; 0,0142821920093; -0,000346011975369
35; 0,0715507714946; 0,000316132133031; -0,0106581466521; -9,205137369E-05; 0,0138018668842; -0,000212219497066
Upvotes: 0
Reputation: 186668
If we can't use string.Split
we can try to split by regular expressions with a help of Regex.Split
; for a given line
string line = @" 33 0.573140941467E-01 0.112914262390E-03 0.255553577735E-02 0.497192659486E-04 0.141869181079E-01-0.147813598922E-03";
We can try
// Split either
// 1. by space
// 2. zero length "char" which is just after a [0..9] digit and followed by "-" or "+"
var items = Regex
.Split(line, @" |((?<=[0-9])(?=[+-]))")
.Where(item => !string.IsNullOrEmpty(item)) // we don't want empty parts
.Skip(1) // skip 1st 33
.Select(item => double.Parse(item)); // we want double
Console.WriteLine(string.Join(Environment.NewLine, items));
and get
0.573140941467E-01
0.112914262390E-03
0.255553577735E-02
0.497192659486E-04
0.141869181079E-01
-0.147813598922E-03
In case of a Text file we should split each line:
Regex regex = new Regex(@" |((?<=[0-9])(?=[+-]))");
var records = File
.ReadLines(@"c:\MyFile.txt")
.Select(line => regex
.Split(line)
.Where(item => !string.IsNullOrEmpty(item))
.Skip(1)
.Select(item => double.Parse(item))
.ToArray());
Demo:
string[] test = new string[] {
// your examples
" 33 0.573140941467E-01 0.112914262390E-03 0.255553577735E-02 0.497192659486E-04 0.141869181079E-01-0.147813598922E-03",
" 34 0.570076593453E-01 0.100112550891E-03 0.256427138318E-02-0.868691490164E-05 0.142821920093E-01-0.346011975369E-03",
" 35 0.715507714946E-01 0.316132133031E-03-0.106581466521E-01-0.920513736900E-04 0.138018668842E-01-0.212219497066E-03",
// Some challenging cases (mine)
" 36 123+456-789 123e+78 9.9e-95 0.0001",
};
Regex regex = new Regex(@" |((?<=[0-9])(?=[+-]))");
var records = test
.Select(line => regex
.Split(line)
.Where(item => !string.IsNullOrEmpty(item))
.Skip(1)
.Select(item => double.Parse(item))
.ToArray());
string testReport = string.Join(Environment.NewLine, records
.Select(record => $"[{string.Join(", ", record)}]"));
Console.WriteLine(testReport);
Outcome:
[0.0573140941467, 0.00011291426239, 0.00255553577735, 4.97192659486E-05, 0.0141869181079, -0.000147813598922]
[0.0570076593453, 0.000100112550891, 0.00256427138318, -8.68691490164E-06, 0.0142821920093, -0.000346011975369]
[0.0715507714946, 0.000316132133031, -0.0106581466521, -9.205137369E-05, 0.0138018668842, -0.000212219497066]
[123, 456, -789, 1.23E+80, 9.9E-95, 0.0001]
Upvotes: 1
Reputation: 706
The answers i have seen so far are so complex. Here is a simple one without overthinking
According to @Veljko89's comment, i have updated the code with unlimited number support
List<double> ParseLine(string line)
{
List<double> ret = new List<double>();
ret.Add(double.Parse(line.Substring(0, line.IndexOf(' '))));
line = line.Substring(line.IndexOf(' ') + 1);
for (; !string.IsNullOrWhiteSpace(line); line = line.Substring(line.IndexOf('E') + 4))
{
ret.Add(double.Parse(line.Substring(0, line.IndexOf('E') + 4)));
}
return ret;
}
Upvotes: 1
Reputation: 4240
You could do this:
public void ParseFile(string fileLocation)
{
string[] lines = File.ReadAllLines(fileLocation);
foreach(var line in lines)
{
string[] parts = var Regex.Split(line, "(?((?<!E)-)| )");
if(parts.Any())
{
int first = int.Parse(parts[0]);
double[] others = parts.Skip(1).Select(a => double.Parse(a)).ToArray();
}
}
}
Upvotes: 1
Reputation: 384
I just went non optimal and replaced the "E-" string to something else while I replaced all the negative sign with a space and a negative sign (" -") then reverted all the "E-" values.
Then I was able to use split to extract the values.
private static IEnumerable<double> ExtractValues(string values)
{
return values.Replace("E-", "E*").Replace("-", " -").Replace("E*", "E-").Split(' ').Select(v => double.Parse(v));
}
Upvotes: 3
Reputation: 178
Solve this with a regular expression. My first shot is:
"[\s-+]\d+\.\d+E[+-]\d\d"
I just tried it this way:
using System;
using System.Globalization;
using System.Text.RegularExpressions;
namespace ConsoleApp1 {
class Program {
static void Main(string[] args) {
var fileContents =
"33 0.573140941467E-01 0.112914262390E-03 0.255553577735E-02 0.497192659486E-04 0.141869181079E-01-0.147813598922E-03"
+ "34 0.570076593453E-01 0.100112550891E-03 0.256427138318E-02-0.868691490164E-05 0.142821920093E-01-0.346011975369E-03"
+ "35 0.715507714946E-01 0.316132133031E-03-0.106581466521E-01-0.920513736900E-04 0.138018668842E-01-0.212219497066E-03";
var rex = new Regex(@"[\s-+]\d+\.\d+E[+-]\d\d", RegexOptions.Multiline);
foreach (Match match in rex.Matches(fileContents)) {
double d = double.Parse(match.Value.TrimStart(), NumberFormatInfo.InvariantInfo);
Console.WriteLine("found a match: " + match.Value.TrimStart() + " => " + d);
}
Console.ReadLine();
}
}
}
With this output (german localization, with comma as decimal separator):
found a match: 0.573140941467E-01 => 0,0573140941467
found a match: 0.112914262390E-03 => 0,00011291426239
found a match: 0.255553577735E-02 => 0,00255553577735
found a match: 0.497192659486E-04 => 4,97192659486E-05
found a match: 0.141869181079E-01 => 0,0141869181079
found a match: -0.147813598922E-03 => -0,000147813598922
found a match: 0.570076593453E-01 => 0,0570076593453
found a match: 0.100112550891E-03 => 0,000100112550891
found a match: 0.256427138318E-02 => 0,00256427138318
found a match: -0.868691490164E-05 => -8,68691490164E-06
found a match: 0.142821920093E-01 => 0,0142821920093
found a match: -0.346011975369E-03 => -0,000346011975369
found a match: 0.715507714946E-01 => 0,0715507714946
found a match: 0.316132133031E-03 => 0,000316132133031
found a match: -0.106581466521E-01 => -0,0106581466521
found a match: -0.920513736900E-04 => -9,205137369E-05
found a match: 0.138018668842E-01 => 0,0138018668842
found a match: -0.212219497066E-03 => -0,000212219497066
Upvotes: 6