Reputation: 147
This is a largely conceptual question so i dont have any code to show. I'll try to explain this as best i can. I am writing a program that is supposed to find common sequences of numbers found in a large table of random combinations.
So for example take this data:
1 5 3 9 6 3 8 8 3 3
6 7 5 5 5 4 9 2 0 1
6 4 4 3 7 8 3 9 5 6
2 4 2 4 5 5 3 4 7 7
1 5 6 3 4 9 9 3 3 2
0 2 7 9 4 5 3 9 8 3
These are random combinatinos of the numbers 1-9. For every 3 digit (or more) sequence found more than once i need to put that into another database. So the first row contains "5 3 9" and the 6th row also contains "5 3 9". I would put that sequence in a separate table with the number of times it was found.
I'm still working out the algorithm for actually making these comparisons but i figure i'll have to start with "1 5 3", compare that to every single 3 number trio found, then move on to "5 3 9" then "3 9 6" etc....
MY MAIN PROBLEM RIGHT NOW is that i dont know how to do this if these numbers are stored in a database. My database table has 11 columns. One column for each individual number, and one column for the 10 digit sequence as a whole. Columns are called Sequence, 1stNum, 2ndNum, 3rdNum...10thNum.
Visual: first row in my database for the data above would be this :
| 1 5 3 9 6 3 8 8 3 3 | 1 | 5 | 3 | 9 | 6 | 3 | 8 | 8 | 3 | 3 |
("|" divide columns)
How do i make comparisons efficiently with Java? I'm iterating over every row in the table many times. Once for the initial sequence to be compared, and for every one of those sequences i go through each row. Basically a for loop in a for loop. This sounds like its going to take a ton of queries and could take forever if the table gets to be massive (which it will).
Is it more computationally efficient if i iterate through a database using queries or if i dump the database and iterate through a file?
I tried to explain this as best as i could, its a very confusing process for me. I can clarify anything you need me to. I just need guidance on what the best course of action for this would be.
Upvotes: 3
Views: 1165
Reputation: 25980
Here is what I would do, assuming you have retrieved the sequences in a list :
List<String> sequences = Arrays.asList("1539638833","6755549201","6443783956","2424553477","1563499332","0279453983");
Map<String,Integer> count = new HashMap<>();
for (String seq : sequences) {
int length = seq.length();
for (int i=0 ; i<length - 2 ; i++) {
String sub = seq.substring(i,i + 3);
count.put(sub,count.containsKey(sub) ? count.get(sub) + 1 : 1);
}
}
System.out.println(count);
Ouput :
{920=1, 783=1, 945=1, 332=1, 963=1, 644=1, 156=1, 983=1, 453=1, 153=1, 388=1, 534=1,
455=1, 245=1, 539=2, 554=1, 242=1, 555=1, 553=1, 437=1, 883=1, 349=1, 755=1, 675=1,
638=1, 395=1, 201=1, 956=1, 933=1, 499=1, 634=1, 839=1, 794=1, 027=1, 477=1, 833=1,
347=1, 492=1, 378=1, 279=1, 993=1, 443=1, 396=1, 398=1, 549=1, 563=1, 424=1}
You can then store these values in the database from the Map.
Upvotes: 1
Reputation: 261
You can do it in sql with a union clause:
select sum(c), sequence
from
(
select
count(*) as c, concat(col1 ,col2 , col3) as sequence
from t
group by col1, col2, col3
union
select
count(*) as c, concat(col2 ,col3 , col4) as sequence
from t
group by col2, col3, col4
union (... and so on enumerating through the column combinations)
) as tt
group by sequence
I would imagine a pure java implementation would be quicker and have less memory overhead. But if you already have it in the database it may be quick enough.
Upvotes: 0