serendipity
serendipity

Reputation: 862

Comparing data in two excel files and write corresponding mappings in third file

I have a list of drug substances with corresponding IDs in two excel sheets coming from two different data sources e.g.

Data from World health Organization: enter image description here

Data from another source looks similar but has different product IDs and additional products. However, the way substances are given IDs is standard across the board.

I have to read these two separate sheets and compare if some products match based on their substances. If they do I have to map the corresponding product IDs from both sheets to each other. So my final sheet will look like:

Product ID1 Product ID2 Substance1 Substance2 Substance3 Substance4 .....

Note - a product could contain more than 100 substances.

Here's the way I am trying to tackle this but need help:

  1. Read data from both the sheets and put in Maps

  2. Compare the two maps (this is where I am having trouble)

  3. Write data that mapped to an excel file.

public static void main(String[] args) {

    String readFile = "C:\\Users\\admin\\Desktop\\SampleData";
    HashMap<Double, Set<Object>> productMapWHO = new HashMap<Double, Set<Object>>();
    HashMap<Double, Set<Object>> productMapNDC = new HashMap<Double, Set<Object>>();
    productMapWHO = readExcel(0, readFile);
    productMapNDC = readExcel(1, readFile);

    Map<Double,Map<Double,Set<Object>>> WHOtoNDCMapping = new HashMap<Double,Map<Double,Set<Object>>>();

    WHOtoNDCMapping = compareProductMaps(productMapWHO,productMapNDC);

    String writeFile = "C:\\Users\\admin\\Desktop\\WHO_NDC_Mapping.xls";

    try {
        writeToExcel(WHOtoNDCMapping,writeFile);
    } catch (InvalidFormatException e) {
        e.printStackTrace();
    } catch (HPSFException e) {
        e.printStackTrace();
    }

}


private static HashMap<Double, Set<Object>> readExcel(int sheetNumber, String fileName) {


    HashMap<Double, Set<Object>> productMap = new HashMap<Double, Set<Object>>();

    try {
        FileInputStream file = new FileInputStream(new File(fileName));

        //Create Workbook instance holding reference to .xlsx file
        XSSFWorkbook workbook = new XSSFWorkbook(file);
        //Get first/desired sheet from the workbook
        XSSFSheet sheet = workbook.getSheetAt(sheetNumber);
        //Iterate through each rows one by one
        Iterator<Row> rowIterator = sheet.iterator();

        while (rowIterator.hasNext()) {

            List<String> substancelist = new ArrayList<String>();

            Row row = rowIterator.next();

            double key;
            Object value="";
            //substancelist.clear();
            Iterator<Cell> cellIterator = row.cellIterator();
            Cell cell =null;
            while (cellIterator.hasNext()) {

                if(cell.getColumnIndex() == 1)
                    key = cell.getNumericCellValue();

                switch (cell.getCellType())
                {
                case Cell.CELL_TYPE_NUMERIC:
                    value = cell.getNumericCellValue();
                    break;

                case Cell.CELL_TYPE_STRING:
                    value = cell.getStringCellValue().trim();
                    break;

                }

                Set<Object> list = productMap.get(key);
                if (list == null) productMap.put(key, list = new HashSet<Object>());
                list.add(value);
            }
        }
    }

    catch (Exception e) {
        e.printStackTrace();
    }
    return productMap;
}


private static Map<Double,Map<Double,Set<Object>>> compareProductMaps (HashMap<Double, Set<Object>>productMap1, HashMap<Double, Set<Object>>productMap2) {

    Map<Double,Map<Double,Set<Object>>> finalMapping = new HashMap<Double,Map<Double,Set<Object>>>();


    for(Map.Entry<Double, Set<Object>> entry : productMap1.entrySet()) {
        Double key = entry.getKey();
        Map<Double,Set<Object>> mappedIds = new HashMap<Double, Set<Object>>();
        for(Set<Object> valueList : productMap1.values()) {
            if (valueList.size() == productMap2.values().size() && productMap2.values().containsAll(valueList))
            {
                Double productId2 = productMap2.get(valueList); //throws error here. I want to get the key for the corresponding valuelist that matched.
                mappedIds.put(productId2,valueList);
                finalMapping.put(key,mappedIds);
            }
        }
    }


    return finalMapping;

}

private static void writeToExcel(Map<Double,Map<Double,Set<Object>>> finalMapping, String xlsFilename) throws HPSFException, InvalidFormatException {


    Workbook wb = null;

    try {
        wb = WorkbookFactory.create(new FileInputStream(xlsFilename));
    } catch (EncryptedDocumentException e) {
        e.printStackTrace();
    } catch (FileNotFoundException e) {
        e.printStackTrace();
    } catch (IOException e) {
        e.printStackTrace();
    }
    Sheet sheet = wb.createSheet("WHOtoNDCMapping");

    int rowIdx = 0;
    int cellIdx = 0;

    // Header
    Row hssfHeader = sheet.createRow(rowIdx);


    rowIdx = 1;
    Row row = sheet.createRow(rowIdx++);
    cellIdx = 0;

    for(Double productId1 : finalMapping.keySet()) {
        Map<Double,Set<Object>> m1 = finalMapping.get(productId1);
        Cell cell = row.createCell(cellIdx++);
        cell.setCellValue(productId1);

        for(Double productId2 : m1.keySet()) {
            Set<Object> substanceList = m1.get(productId2);
            cell = row.createCell(cellIdx++);
            cell.setCellValue(productId2);

            for (Object substance : substanceList){
                if (substance instanceof String) {
                    cell.setCellValue((String) substance);
                } else if (substance instanceof Number) {
                    cell.setCellValue(((Number) substance).doubleValue());
                } else {
                    throw new RuntimeException("Cell value of invalid type " + substance);
                }
            }
        }
    }
    try {
        FileOutputStream out = new FileOutputStream(xlsFilename);
        wb.write(out);
        out.close();
    } catch (IOException e) {
        throw new HPSFException(e.getMessage());
    }
}

Upvotes: 2

Views: 272

Answers (1)

Thomas
Thomas

Reputation: 88747

I'd not use maps of maps of maps etc. but build a class that properly represents a product. If the structure of both files is the same you could use something like this (simplified, I'll leave some work for you ;) ):

class ExcelProduct {
  String productId;
  String productName;
  Set<String> substanceIds; //assuming order is not relevant, otherwise use a list
}

Then you could read that into a Map<String, ExcelProduct> where the key is the product id and finally work on those maps, e.g. by iterating over one of the maps and getting the according product from the second map:

for( ExcelProduct leftProduct : leftMap.values() ) {
  ExcelProduct rightProduct = rightMap.get(leftProduct.productId);

  //product not present in right map so skip
  if( rightProduct == null ) {
    continue;
  }

  //compare products here, e.g. comparing the substance ids
  if( leftProduct.substanceIds.equals( rightProduct.substanceIds) ) {
    //do whatever is needed, e.g. add the product to the result list which will be written to the result excel file
    //you probably don't need a result map here
  }
}

Note that whether leftProduct.substanceIds.equals( rightProduct.substanceIds) works correctly depends on what set implementation you're using, but the built-in implementations should use AbstractSet.equals() which if the passed object is a set as well will compare the sizes and whether one of the sets contains all elements of the other (which means that if all elements are there and the size is the same there can't be any missing/additional elements because sets can't contain duplicates).

Upvotes: 1

Related Questions