Reputation: 61
Update: I have a couple of what are probably silly questions about commenter 6502's answer (below). If anyone could help, I'd really appreciate it.
1) I understand that data 1 and data 2 are the maps, but I don't understand what allkeys is for. Can anyone explain?
2) I know that: data1[vector1[i].name] = vector1[i].value; means assign a value to the map of interest where the correct label is... But I don't understand this: vector1[i].name and vector1[i].value. Are't "name" and "value" two separate vectors of labels and values? So what are they doing on vector1? Shouldn't this read, name[i] and value[i] instead?
Thanks everyone.
I have written code for performing a calculation. The code uses data from elsewhere. The calculation code is fine, but I'm having trouble manipulating the data.
The data exist as sets of vectors. Each set has one vector of labels (names, these are strings) and a corresponding set of values (doubles or ints).
The problem is that I need each data set to have the same name/label in the same column as the other data sets. This problem is not the same as sorting the data in the vectors (which I know how to do) because sometimes names/labels can be missing from some vectors.
For example:
Data set 1:
vector names1 = Jim, Tom, Mary
vector values1 = 1 2 3
Data set 2:
vector names2 = Tom, Mary, Joan
vector values2 = 2 3 4
I want (pseudo-code) ONE name vector that has all possible names. I also want each corresponding numbers vector to be sorted the SAME way:
vector namesUniversal = Jim, Joan, Mary, Tom
vector valuesUniversal1 = 1 0 3 2
vector valuesUniversal2 = 0 4 3 2
What I want to do is come up with a universal vector that contains ALL the labels/names sorted alphabetically and all the corresponding numerical data sorted too.
Can anyone tell me whether there is an elegant way to do this in c++? I guess I could compare each element of each name vector with each element of each other name vector, but this seems quite clunky and I would not know how to get the data into the right columns in the corresponding data vectors. Thanks for any advice.
Upvotes: 0
Views: 893
Reputation: 7656
6502's solution looks fine at first glance. You should probably use std::merge
for the merging part though.
EDIt:
I forgot to mention that there is now also a multiway_merge
extension of the STL available in the GNU version of the STL. It is a part of the parallel mode, so it resides in the namespace __gnu_parallel
. If you need to do multiway merging, it will be very hard to come up with something as fast or simple to use as this.
Upvotes: 3
Reputation: 114579
The algorithm you are looking for is usually named "merging". Basically you sort the two data sets and then look at data in pairs: if the keys are equal then you process and output the pair, otherwise you process and advance only the smallest one.
You must also handle the case where one of the two lists ends before the other (this can be avoided by using special flag values that are guaranteed to be higher than any value you need to process).
The following is pseudocode for merging
vector1
vector2
index1 = index2 = 0;
index1 >= vector1.size()
and index2 >= vector2.size()
(in other words until both vectors are exhausted)index1 == vector1.size()
(i.e. if vector1
has been processed) then output vector2[index2++]
index2 == vector2.size()
(i.e. if vector2
has been processed) then output vector1[index1++]
vector1[index1] == vector2[index2]
output merged data and increment both index1
and index2
vector1[index1] < vector2[index2]
output vector1[index1++]
vector2[index2++]
However in C++ you can implement a much easier to write solution that is probably still fast enough (warning: untested code!):
std::map<std::string, int> data1, data2;
std::set<std::string> allkeys;
for (int i=0,n=vector1.size(); i<n; i++)
{
allkeys.insert(vector1[i].name);
data1[vector1[i].name] = vector1[i].value;
}
for (int i=0,n=vector2.size(); i<n; i++)
{
allkeys.insert(vector2[i].name);
data2[vector2[i].name] = vector2[i].value;
}
for (std::set<std::string>::iterator i=allkeys.begin(), e=allkeys.end();
i!=e; ++i)
{
const std::string& key = *i;
std::cout << key << data1[key] << data2[key] << std::endl;
}
The idea is to just build two maps data1
and data2
from name to values, and at the same time collecting all keys that are appearing in a std::set
of keys named allkeys
(adding the same name to a set multiple times does nothing).
After the collection phase this set can then be iterated to find all the names that have been observed and for each name the value can be retrieved from data1
and data2
maps (std::map<std::string, int>
will return 0 when looking for the value of a name that has not been added to the map).
Technically this is sort of overkilling (uses three balanced trees to do the processing that would have required just two sort operations) but is less code and probably acceptable anyway.
Upvotes: 4
Reputation: 1427
A quick way which comes to mind is to use a map<pair<string, int>, int>
and for each value store it in the map with the right key. (For example (Tom, 2) in the first values set will be under the key (Tom, 1) with value 2)
Once the map is ready iterate over it and build whatever data structure you want (Assuming the map is not enough for you).
Upvotes: 1
Reputation: 14695
I think you need to alter how you store this data. It looks like you're saying each number is logically associated with the name in the same position: Jim = 1, Mary = 3, etc.
If so, and you want to stick with a vector
of some kind, you could redo your data structure like so:
typedef std::pair<std::string, int> NameNumberPair;
typedef std::vector<NameNumberPair> NameNumberVector;
NameNumberVector v1;
You'll need to write your own operator<
which returns based on the sort order of the underlying names. However, as Nawaz points out, a map
would be a better way to represent the associated nature of the data.
Upvotes: 0