Reputation: 65
I have a .txt
file and need to read from it. The file consists data of cities, their longitude, latitude and some other stuff.
Thats the data format:
DE 01945 **Tettau** Brandenburg BB 00 Landkreis Oberspreewald-Lausitz 12066 **51.4333 13.7333**
DE 01968 **Schipkau Hörlitz** Brandenburg BB 00 Landkreis Oberspreewald-Lausitz 12066 **51.5299 13.9508**
...
In every line of the file is one city, but for me only the bold information is important (name, Latitude, Longitude). All in all there are 16k lines in the file. Can you please explain me how i get theese information.
QFile file ("path");
QTextStream in (&file);
while (!in.atEnd()) {
QString line = in.readLine();
std::string s = line.toLocal8Bit().constData();
std::cout << s << endl;
}
file.close();
As far I can only read the whole line but I dont have any idea how to get these 3 information of every line. I created a class "City" with three members. _name, _longitude, _latitude. And then i wanted to create a vector to safe every city inside. Is this method efficent ? But more important please tell me how i can read theese 3 bold information of every line, cause i have no idea how to do it. (I thought to iterate through every character of the string and search for tabs, but it took freaky long). So I'm really happy if you show me a fast method how to do it. Programm is developed in Qt with c++.
PS: I also noticed the Problem that some city names consists of 2 words, seperated by a space.
Upvotes: 2
Views: 69
Reputation: 243927
The file you have is a tab-separated values (TSV), so the logic is to obtain each line and separate through the tab, and then choose the elements as shown below:
#include <QFile>
#include <QTextStream>
#include <iostream>
struct CityData
{
std::string city;
float latitude;
float longitude;
};
int main()
{
QFile file("/path/of/DE.txt");
if(!file.open(QFile::ReadOnly | QFile::Text))
return -1;
QTextStream stream(&file);
QString line;
std::vector<CityData> datas;
while (stream.readLineInto(&line)) {
QStringList elements = line.split("\t");
CityData data{elements[2].toStdString(),
elements[9].toFloat(),
elements[10].toFloat()
};
datas.push_back(data);
}
for(const CityData & data: datas){
std::cout<< "city: "<< data.city <<"\t" << "latitude: "<< data.latitude <<"\t" << "longitude: "<<data.longitude<<"\n";
}
return 0;
}
Output:
city: Tettau latitude: 51.4333 longitude: 13.7333
city: Guteborn latitude: 51.4167 longitude: 13.9333
city: Hermsdorf latitude: 51.4055 longitude: 13.8937
city: Grünewald latitude: 51.4 longitude: 14
city: Hohenbocka latitude: 51.431 longitude: 14.0098
city: Lindenau latitude: 51.4 longitude: 13.7333
city: Ruhland latitude: 51.4576 longitude: 13.8664
city: Schwarzbach latitude: 51.45 longitude: 13.9333
city: Kroppen latitude: 51.3833 longitude: 13.8
city: Schipkau Hörlitz latitude: 51.5299 longitude: 13.9508
city: Senftenberg latitude: 51.5252 longitude: 14.0016
city: Schipkau latitude: 51.5456 longitude: 13.9121
...
In this type of materials you should read the readme.txt
:
...
The data format is tab-delimited text in utf8 encoding, with the following fields :
country code : iso country code, 2 characters
postal code : varchar(20)
place name : varchar(180)
admin name1 : 1. order subdivision (state) varchar(100)
admin code1 : 1. order subdivision (state) varchar(20)
admin name2 : 2. order subdivision (county/province) varchar(100)
admin code2 : 2. order subdivision (county/province) varchar(20)
admin name3 : 3. order subdivision (community) varchar(100)
admin code3 : 3. order subdivision (community) varchar(20)
latitude : estimated latitude (wgs84)
longitude : estimated longitude (wgs84)
accuracy : accuracy of lat/lng from 1=estimated to 6=centroid
Upvotes: 1
Reputation: 3902
Essentially, you only need to delimit your line:
QStringList delimited = line.split(" ");
QString town = delimited[2];
in order to get Tettau or Schipkau in your example, likewise with the other items.
That said, I'm not sure about the "Schipkau Hörlitz" thing in your example, assuming that this is the name of a single town or a quarter of a town with a composed name. That depends on your format. One option is to start at index 2 and add whatever comes as long as it is not the name of a german state. Of course, this then will only work for germany. You could also try to find out the next index that is only numbers, in your example "00", and work back from that one. Again, depends on your format, and I hope I gave you enough to work with.
Might look like:
QStringList delimited = line.split(" ");
QString town = delimited[2];
size_t pos = 3;
while(not is_german_state(delimited[pos]))
{
town += " " + delimited[pos];
pos++;
}
QString longitude = delimited[pos+6];
QString latitude= delimited[pos+7];
(Note that I did not catch the case when a line is not properly formated and thus delimited[pos] or the ones for longitude or latitude might result in a segmentation fault if not.)
After that you store it in some way, like having a vector<TownData>
with a structure TownData
that stores the data you need, and in each iteration, you append to the vector. I assume that how to do that is clear, but ask if it isn't.
In Qt, in general, it pays to look at the classes you are currently using. In this case, QString
, which has a lot of functionality.
Since a vector is copied when it changes it's reservation size and you asked about efficiency in particular, it would be a good idea to reserve enough space for the vector before you enter the iterations. I'm not aware of any method to get the number of lines in a file without actually iterating through them, so you might need to either do that one time before you actually work with the data in it, or you need to create some estimator, like estimating lines by file size or estimating it to be 16k. Then call vector::reserve(size_type n)
on your vector. That said, 16k lines does not sound as much, might be that this is premature optimization. I'd probably first go without the reservation and simply look if it runs smoothly as it is.
Upvotes: 1