Reputation: 969
I am trying to read in a file of data, approx 2000 lines, the file looks something like
1.1 1.2 1.3 1.4 1.5
1.6 1.7 1.8 1.9
2.0
2.1 2.2 2.3 2.4 2.5
There is actually a blank (white space) and 1.3/1.7 are in the same column
The way I have it setup as storing is a vector of structs where
struct num
{
double d1, d2, d3, d4, d5;
};
What I am trying to achieve is
num A;
vector<num> data
for (int i = 0; i < 4; i++)
{
File >> A.d1 >> A.d2 >> A.d3 >> A.d4 >> A.d5;
data.push_back(A);
}
and to find the logic to recognize the blank space in the second line and store d1=1.6, d2=0, d3=1.7 etc.. and the third line to be d1=2.0 and the d2,d3,d4,d5=0 I am just confused on how to test/get the logic for implementing this, if possible I am in C++ VS2010 After looking at the first answer thought I should provide more info, each line in the file belongs to a satellite, and each number represents an observation on a specific wavelength, so if it is blank it means it has no observations on that wavelength.
So to elaborate, first line represents satellite 1 has an observation on all 5 wavelengths, line 2 reprsents satelittle 2 and has observations on wavelength 1,3,4,5 and none on wavelength 4.
Thats why I am trying to break it into each line as a seperate struct, because each line is a seperate satellite
Upvotes: 0
Views: 1877
Reputation: 1099
Given your file format is space delimited you can extract the columns using a regular expression. I've assumed you can use C++11 or if not Boost regex.
Then you can use the following function to split a string into tokens.
std::vector<std::string> split(const std::string& input, const std::regex& regex) {
// passing -1 as the submatch index parameter performs splitting
std::sregex_token_iterator
first(input.begin(), input.end(), regex, -1),
last;
return std::vector<std::string>(first, last);
}
As an example, assuming your data is in "data.txt", I used it this way to get the values:
#include <iostream>
#include <fstream>
#include <string>
#include <regex>
#include <vector>
using namespace std;
std::vector<std::string> split(const string& input, const regex& regex) {
// passing -1 as the submatch index parameter performs splitting
std::sregex_token_iterator
first(input.begin(), input.end(), regex, -1),
last;
return vector<std::string>(first, last);
}
int main()
{
ifstream f("data.txt");
string s;
while (getline(f, s))
{
vector<string> values = split(s, regex("\\s"));
for (unsigned i = 0; i < values.size(); ++i)
{
cout << "[" << values[i] << "] ";
}
cout << endl;
}
return 0;
}
Which gives the following results:
[1.1] [1.2] [1.3] [1.4] [1.5]
[1.6] [] [1.7] [1.8] [1.9]
[2.0] [] [] []
[2.1] [2.2] [2.3] [2.4] [2.5]
Note, there is a missing column in row 4, but that's because I'm not quite sure how many white spaces you have on that line. If you know there is no more than 5 columns than it could be corrected in the output stage.
Hopefully you find this approach helpful.
Upvotes: 1
Reputation: 41858
Why not just use std:vector
to hold an array of floats.
To add a new element to the vector you use:
As you read in each character, look to see if it is a digit or a period.
If it is, add that to a std::string
, and then use atof
with mystring.c_str()
as the parameter to convert it to a float.
This may also help convert a string to float:
std::string to float or double
So, read into a string, then push the float to a vector, and repeat, skipping characters that are not a digit or a period.
At the end of the line your vector has all the floats, and if you want to join them into a string with a custom delimiter you can look at the answers to this question:
std::vector to string with custom delimiter
Upvotes: 0
Reputation: 10716
Observing your data:
This is what I came up with:
#include <fstream>
#include <iostream>
#include <string>
#include <vector>
#include <cstdlib>
#include <sstream>
#include <iomanip>
#include <cctype>
using namespace std;
//note all the lines are stored WITH newlines at the end of them.
//This is merely an artifact of the methodology I am using,
//as the newline is a flag that truncates output (as per your problem)
vector<string> preparse_input(const std::string& filename) {
vector<string> lines;
ifstream ifile;
ifile.open(filename.c_str(), ios::in);
if (!ifile.is_open()) {
exit(1);
}
string temp, chars, line;
char ch;
while(getline(ifile, temp)) {
temp += "\n";//getline removes the newline: because we need it, reinsert it
istringstream iss(temp);
//first read in the line char by char
while(iss >> noskipws >> ch) {
chars += ch;
}
bool replaced_newline = false;
int nargs = 0;
//I could have used iterators here, but IMO, this way is easier to read. Modify if need be.
for (int i = 0; i < chars.size(); ++i) {
if (isdigit(chars[i]) && chars[i+1] == ' ') {
nargs += 1;
}
else if(isspace(chars[i]) && isspace(chars[i+1])) {
if (chars[i+1] == '\n') {
replaced_newline = true;
}
//this means that there is no value set
//hence, set the value to 0 for the value part:
chars[i+1] = '0';
line += chars[i];
++i;//now, skip to the next character since 1 is for spacing, the other is for the value
nargs += 1;
}
//now rebuild the line:
line += chars[i];
if(isdigit(chars[i]) && chars[i+1] == '\n') {
nargs += 1;
//check nargs:
for (int i = nargs; i < 5; ++i) {
line += " 0";
nargs += 1;
}
}
if (replaced_newline) {
line += '\n';
}
replaced_newline = false;
}
lines.push_back(line);
chars.clear();
line.clear();
}
ifile.close();
return lines;
}
//this way, it's much easier to adapt to any type of input that you may have
template <typename T>
vector< vector<T> > parse_input (const vector<string>& lines) {
vector< vector<T> > values;
T val = 0;
for(vector<string>::const_iterator it = lines.begin(); it != lines.end(); ++it) {
vector<T> line;
istringstream iss(*it);
string temp;
while(getline(iss, temp, ' ')) {
if (istringstream(temp) >> val) {
line.push_back(val);
}
else {
line.push_back(0);//this is the value that badly parsed values will be set to.
//you have the option of setting it to some sentinel value, say -1, so you can go back and correct it later on, if need be. Depending on how you want to treat this error - hard or soft (stop program execution vs adapt and continue parsing), then you can adapt it accordingly
//I opted to treat it as a soft error but without a sentinel value - so I set it to 0 (-1 as that is probably more applicable in a general case), and informed the user that an error occurred
//The flipside of that is that I could have treated this as a hard error and have `exit(2)` (or whatever error code you wish to set).
cerr << "There was a problem storing:\"" << temp << "\"\n";
}
}
values.push_back(line);
}
return values;
}
int main() {
string filename = "data.dat";
vector<string> lines = preparse_input(filename);
vector < vector<double> > values = parse_input<double>(lines);
for (int i = 0; i < values.size(); ++i) {
for (int j = 0; j < values[i].size(); ++j) {
cout << values[i][j] << " ";
}
cout << endl;
}
return 0;
}
Summarily, I broke down the string by reading each line character by character, and then rebuilding each line by replacing the blanks with 0
for easier parsing. Why? Because without some value like that, there is no way to tell which parameter was stored or skipped (using the default ifstream_object >> type
methodology).
This way, if I then use stringstream
objects to parse the input I can correctly determine which parameter is set, or not set; then, store the results and everything is dandy. Which is what you desire.
And, using it on the following data:
1.1 1.2 1.3 1.4 1.5
1.6 1.7 1.8 1.9
2.0
2.0
2.1 2.2 2.3 2.4 2.5
2.1 2.4
Gives you the output:
1.1 1.2 1.3 1.4 1.5
1.6 0 1.7 1.8 1.9
2 0 0 0 0
2 0 0 0 0
2.1 2.2 2.3 2.4 2.5
2.1 0 0 2.4 0
NOTE: Line 3 has 8 spaces (1 for no data and 1 for spacing). Line 4 is the line from your original data. Line 6 contains 5 spaces (following the pattern cited).
Lastly, let me say that this is by far, one of the most insane methods of storing data I've ever encountered.
Upvotes: 2