Reputation: 183
I got lots of data points from a .dat file that looks like this
+ ( 0.00000000E+00 0.00000000E+00 //this '(' happens once per block of data
+ 0.99999997E-04 0.00000000E+00
+ 0.19999999E-03 0.00000000E+00
+ ...
I have no control on to make the program that spits out this data more friendly for me to work with.
So far I got each line in a vector and I want to parse them up so I only have the numbers to work with, but I still want to keep the integrity of the .dat file due to another program that uses the .dat file as is.
I was thinking on separating each string by the space, but the spaces are different sizes (unless that doesn't matter) and placing them in a vector and getting only the data I need, but the first line of the data has 4 strings, where as the rest of the lines has 3.
Any help would be greatly appreciated
Edit: I'm taking the original .dat file, tracing through it, and any block of data that doesn't meet my threshold, gets passed over. Any that does, gets written to a new file. Everything with this new file must be exactly the same as the original file, minus the data I don't need, of course.
[JD] Edit per comments:
How would I parse these lines down, keep everything about it the same without removing anything about the line, and get the numbers so I can work with what I need to keep and what I don't need?
Upvotes: 2
Views: 213
Reputation: 490808
I would create a ctype facet that classifies +
and (
[Edit: and )
, based on comment] as white space, then just read the numbers. Let's assume your criterion for keeping a number is that it's greater than, say, 1.0e-4. To copy the data to a new file, removing the smaller numbers, you could do something like this:
#include <locale>
#include <iostream>
#include <algorithm>
#include <iterator>
#include <vector>
#include <sstream>
#include <numeric>
class my_ctype : public
std::ctype<char>
{
mask my_table[table_size];
public:
my_ctype(size_t refs = 0)
: std::ctype<char>(&my_table[0], false, refs)
{
std::copy_n(classic_table(), table_size, my_table);
my_table['('] = (mask)space;
my_table['+'] = (mask)space;
my_table[')'] = (mask)space;
}
};
int main() {
std::locale x(std::locale::classic(), new my_ctype);
std::cin.imbue(x);
std::remove_copy_if(std::istream_iterator<double>(std::cin),
std::istream_iterator<double>(),
std::ostream_iterator<double>(std::cout, "\n"),
[](double in){return in < 1.0e-4; }); // criterion for removing a number
return 0;
}
I'd guess (but don't really know) that your criterion for removing a number is probably a little more complex than a simple comparison. If it gets much more complex, you probably want to use a manually-defined functor instead of a lambda to define your criterion. The rest of the code (especially the part reading the data) can probably remain unchanged though.
Also note that as-is, I've just written numbers to the output one per line. I don't know if you need to maintain something closer to the original format or not, so for the moment I just kept it simple.
Upvotes: 3
Reputation: 101506
You can get each item at a time, using a file stream's operator>>
, which will skip whitespace. When you get to the column that will either be '(' or blank (eg, whitespace), check it and switch based on what you got. If you got '(', do operator>>
again to get the actual data. If you didn't get '(', then you got data, because operator>>
skips whitespace.
Here's a hopefully complete example:
#include <string>
#include <iostream>
#include <vector>
#include <fstream>
#include <algorithm>
using namespace std;
struct Inbound
{
std::string a_, b_;
};
int main()
{
ifstream f("c:\\dev\\hacks\\data.txt");
while( !f.bad() && !f.eof() )
{
string s;
f >> s; // should be '+' -- discard
f >> s; // either '(' or first datum
if( s == "(" )
f >> s; // get the first datum
Inbound in;
in.a_ = s;
f >> in.b_;
cout << "Got: " << in.a_ << "\t" << in.b_ << endl;
}
}
Output:
Got: 0.00000000E+00 0.00000000E+00
Got: 0.99999997E-04 0.00000000E+00
Got: 0.19999999E-03 0.00000000E+00
Upvotes: 0
Reputation: 9651
You should use a string tokenizer to grab each data. Depending on the librairies you are already using, it could be very simple.
Otherwise, you can make someting very simple by using strtok.
If you are using MS CString, you can code something by yourself like:
CStringArray TokenizeString(const CString& str, const CString &sep)
{
CStringArray elements;
CString item = "";
CString strCpy = str;
long sepPos = strCpy.Find(sep);
while (sepPos != -1)
{
// extract item
item = strCpy.Left(sepPos);
// add it to the list
elements.Add(item);
// prepare next loop
strCpy = strCpy.Right(strCpy.GetLength() - sepPos - sep.GetLength()); // get the right part of the string (after the found separator)
sepPos = strCpy.Find(sep);
}
// add last item if needed (remaining part of the string)
if (!strCpy.IsEmpty()) elements.Add(strCpy);
}
Hope this helps !
Upvotes: 0