Reputation: 4925
I have two space separated strings... (the X doesn't mean the same symbol)
st1 = "abc def kok...."
st2 = "kok bbr def ffe ...."
i would like to construct an intersection string as follows:
common = "kok def"
what is the efficient way to do so in c++?
Thanks
Upvotes: 1
Views: 2784
Reputation: 490623
To expand a bit on the answers you've already gotten, there are basically two factors to consider that you haven't specified. First, if there are duplicate elements in the input, do you want those considered for the output. For example, given input like:
st1 = "kok abc def kok...."
st2 = "kok bbr kok def ffe ...."
Since "kok" appears twice in both inputs, should "kok" appear once in the output or twice?
The second is your usage pattern. Do you have a pattern of reading all the input, then generating a single output, or is a more iterative, where you might read some input, generate an output, read some more input that's added to the previous, generate another output, and so on?
If you're going read all the input, then generate one output, you probably want to use std::vector
followed by std::sort
. If you only want each input to appear only once in the output, regardless of how often it appears in both inputs, then you'd follow that by std::unique
, and finally do your set_intersection
.
If you want to support iterative updates, then you probably want to use std::set
or std::multiset
(std::set
for each output to be unique, std::multiset
if repeated inputs should give repeated results).
Edit: based on the lack of duplication in the input, a really quick simple implementation would be something like:
#include <string>
#include <set>
#include <algorithm>
#include <iterator>
#include <sstream>
#include <iostream>
int main() {
std::string st1("abc def kok");
std::string st2("kok bbr def ffe");
std::istringstream s1(st1);
std::istringstream s2(st2);
// Initialize stringstreams. Whine about most vexing parse.
std::set<std::string> words1((std::istream_iterator<std::string>(s1)),
std::istream_iterator<std::string>());
std::set<std::string> words2((std::istream_iterator<std::string>(s2)),
std::istream_iterator<std::string>());
std::ostringstream common;
// put the intersection into common:
std::set_intersection(words1.begin(), words1.end(),
words2.begin(), words2.end(),
std::ostream_iterator<std::string>(common, " "));
std::cout << common.str(); // show the result.
return 0;
}
Upvotes: 1
Reputation: 34621
I'm assuming you've tokenized your strings already (this solution seems easy to implement).
// Data
std::vector<string> a,b;
a.push_back("abc");b.push_back("kok");
a.push_back("def");b.push_back("bbr");
a.push_back("kok");b.push_back("def");
a.push_back("foo");b.push_back("ffe");
// Allocate space for intersection
std::vector<string> v(a.size()+b.size());
// Sort as required by set_intersection
std::sort(a.begin(),a.end());
std::sort(b.begin(),b.end());
// Compute
std::vector<string>::iterator it = std::set_intersection(a.begin(),a.end(),b.begin(),b.end(),v.begin());
// Display
v.erase(it,v.end());
for(std::vector<string>::iterator it = v.begin();it < v.end(); ++it) std::cout<<*it<<std::endl;
Complexity should be O(n log n) in the number of tokens (or sub-strings).
Upvotes: 10
Reputation: 69280
st1
in substrings and put them all into a std::set
st2
in substrings and check for each of them if they exist in the set created in step 1.This will give O(n log n)
execution time. You have to loop through both strings exactly once. Insertion and retrieval from the set is usually O(log n)
for each element, which gives O(n log n)
.
If you can use a hash based set (or some other unordered set) with O(1)
insert and retrieval complexity you will cut the complexity down to O(n)
.
Upvotes: 2