Reputation: 343
I need to compare results of two lists coming from two different sources.
List<MyData> baseList = new ArrayList<>();
and
List<MyData> externalList = new ArrayList<>();
I need to compare CFCHash records on both the lists w.r.t the UserACCNUM, If there is any changes in the CDCHash I need to update that particular record in baseList.
I tried below looping which didn't sound me efficient
for(MyData ext : externalList) {
for(MyaData base : baseList) {
if(ext.getCDCHash().equals(base.getCDCHash()) && ext.getAccNum().equals(base.getAccNum()) {
// no change
}
else {
// changes found - need to update
}
}
}
Is list.stream() efficient in this case? I have nearly 100k records to compare.
How do I achieve this efficiently?
Upvotes: 2
Views: 523
Reputation: 43718
You can transform your quadratic algorithm into a linear one by creating a fast lookup Map
for one of the two lists and then loop the other list while using the lookup to find the corresponding record in the other list by account number.
JS example just because we can't run Java here ;) Note that we assume both lists are of the same length for the sake of the example.
const listA = [{ hash: 'account1v1', account: 1 }, { hash: 'account2v1', account: 2 }];
const listB = [{ hash: 'account1v1', account: 1 }, { hash: 'account2v2', account: 2 }];
const dirtyRecords = findDirtyRecords(listA, listB);
console.log(dirtyRecords);
function findDirtyRecords(listA, listB) {
const listAMap = new Map();
for (const record of listA) listAMap.set(record.account, record);
return listB.filter(r => r.hash !== listAMap.get(r.account).hash);
}
Upvotes: 2
Reputation: 106410
A little bit of set theory may be beneficial here, if MyData
implements:
Comparable
equals
and hashCode
...and you're open to using Google Guava.
If you set up the two lists that you have as Set
s instead (and they could be ordered if you really wanted them to be...), then all you would have to do is invoke Sets.difference(baseList, externalList)
. You could then iterate through that resulting collection of records to update the values you need to in baseList
.
Don't concern yourself with doing this in one fell swoop. It's better and more succinct to do this as two separate actions so that it's easier to debug and establish what's going on.
Upvotes: 1
Reputation: 5578
Well first of all, your question might not solve your problem.
As I see based on the tables you provided, your hash does change, and the values might change. I see that the unique identifier most likely is user acc num
.
Depending on the source of your data, it might make sense to iterate / paginate over both of your sources ( if they're ordered by some parameter, e.g. acct num ) and compare just subsets of data.
Let's say, query accounts 1-20 ( or 1-1000 ), get the min/max acct num & then run the same query on the second source of data to get the same accounts.
Then sort & iterate both collections ( try & match the IDs ) and compare values on each line.
Upvotes: 0