janopan
janopan

Reputation: 47

Compare 2 text files and find the difference in a list and find which list values are not matching

I am reading 2 text files(may contain duplicates) using Scanner and writing them to arraylist. I am comparing both the arraylist to find the difference. When I print out I can see what the difference are but I don't know which record is from what file(text file name)

Contents in text1.txt

TIMESTAMP,FE,TDI,20190703113119,20190601000000,20190701000000,
TIMESTAMP,FE,KYMI,20190703113130,20190601000000,20190701000000,
TIMESTAMP,FE,UMRI,20190703113154,20190601000000,20190701000000,
TIMESTAMP,FE,MLI,20190703113211,20190601000000,20190701000000,
TIMESTAMP,FE,WOLI,20190703113221,20190601000000,20190701000000,
TIMESTAMP,FE,VEM,20190703113221,20190601000000,20190701000000,
TIMESTAMP,FE,ZER,20190703113154,20190601000000,20190701000000,

Contents in text2.txt

TIMESTAMP,FE,TDL,20190703113119,20190601000000,20190701000000,
TIMESTAMP,FE,KYMA,20190703113130,20190601000000,20190701000000,
TIMESTAMP,FE,UMRC,20190703113154,20190601000000,20190701000000,
TIMESTAMP,FE,MLW,20190703113211,20190601000000,20190701000000,
TIMESTAMP,FE,WOLF,20190703113221,20190601000000,20190701000000,
TIMESTAMP,FE,VEM,20190703113221,20190601000000,20190701000000,
TIMESTAMP,FE,ZER,20190703113154,20190601000000,20190701000000,

code:

Scanner prodScanner = new Scanner(prodFile);
     while (prodScanner.hasNextLine()) {
     String currentRecord = prodScanner.nextLine().trim(); 
                    if (currentRecord.length() > 0) {
                    prodRecordsFromStatement.add(currentRecord);
                  }
           }
Scanner nonProdScanner = new Scanner(nonProdFile);
while (nonProdScanner.hasNextLine()) {
            String currentRecord = nonProdScanner.nextLine().trim();  
            if (currentRecord.length() > 0) {                                   
     nonProdRecordsFromStatement.add(currentRecord);
                                }
                            }
Collection<String> result = new ArrayList<>(CollectionUtils.disjunction(prodRecordsFromStatement, nonProdRecordsFromStatement));
 List<String> resultList = new ArrayList<>(result);
 Collections.sort(resultList);

Actual Results:

TIMESTAMP,FE,KYMA,20190703113130,20190601000000,20190701000000,
TIMESTAMP,FE,KYMI,20190703113130,20190601000000,20190701000000,
TIMESTAMP,FE,MLI,20190703113211,20190601000000,20190701000000,
TIMESTAMP,FE,MLW,20190703113211,20190601000000,20190701000000,
TIMESTAMP,FE,TDI,20190703113119,20190601000000,20190701000000,
TIMESTAMP,FE,TDL,20190703113119,20190601000000,20190701000000,
TIMESTAMP,FE,UMRC,20190703113154,20190601000000,20190701000000,
TIMESTAMP,FE,UMRI,20190703113154,20190601000000,20190701000000,
TIMESTAMP,FE,WOLF,20190703113221,20190601000000,20190701000000,
TIMESTAMP,FE,WOLI,20190703113221,20190601000000,20190701000000,

Expected Results: I want name of the file/list to be display for easy understanding

text2.txt,TIMESTAMP,FE,KYMA,20190703113130,20190601000000,20190701000000,
text1.txt,TIMESTAMP,FE,KYMI,20190703113130,20190601000000,20190701000000,
text1.txt,TIMESTAMP,FE,MLI,20190703113211,20190601000000,20190701000000,
text2.txt,TIMESTAMP,FE,MLW,20190703113211,20190601000000,20190701000000,
text1.txt,TIMESTAMP,FE,TDI,20190703113119,20190601000000,20190701000000,
text2.txt,TIMESTAMP,FE,TDL,20190703113119,20190601000000,20190701000000,
text2.txt,TIMESTAMP,FE,UMRC,20190703113154,20190601000000,20190701000000,
text1.txt,TIMESTAMP,FE,UMRI,20190703113154,20190601000000,20190701000000,
text2.txt,TIMESTAMP,FE,WOLF,20190703113221,20190601000000,20190701000000,
text1.txt,TIMESTAMP,FE,WOLI,20190703113221,20190601000000,20190701000000,

Upvotes: 1

Views: 91

Answers (2)

Trevor Freeman
Trevor Freeman

Reputation: 7232

How performant does your solution need to be? If performance is not super critical, and your lists are not long, then you could switch to using subtract instead of disjunction.

E.g.

Collection<String> resultProdRecords = new ArrayList<>(CollectionUtils.subtract(prodRecordsFromStatement, nonProdRecordsFromStatement));
Collection<String> resultNonProdRecords = new ArrayList<>(CollectionUtils.subtract(prodRecordsFromStatement, nonProdRecordsFromStatement));

resultProdRecords will contain all the lines from prodRecordsFromStatement that are not also in nonProdRecordFromStatement.

resultNonProdRecords will contain all the lines from nonProdRecordFromStatement that are not also in prodRecordsFromStatement.

Upvotes: 1

Jason
Jason

Reputation: 11832

Iterate through resultList checking to see if the current item is also in prodRecordsFromStatement.

If so, it's from file 1, otherwise it's from file 2.

Upvotes: 1

Related Questions