user6601906
user6601906

Reputation:

getting duplicate values in map for every key

I am trying to put all values in map and i have more then 20k values , now I am trying to put values in map by using idea as key 1 contains values from 1(consider i) to 1000 (i.e i*1000) but the output I'm getting contains duplicate values (key 1 & 2 have same values), not sure what wrong I am doing

here is code

 public class GetNumbers {

        public static List<String> createList() throws IOException {
            List<String> numbers = new LinkedList<>();
            Path path = null;
            File file = null;
            BufferedReader reader = null;
            String read = "";
            try {
                path = Paths.get("file.txt");
                file = path.toFile();
                reader = new BufferedReader(new FileReader(file));
                while ((read = reader.readLine()) != null) {
                    numbers.add(read);
                }
            } catch (FileNotFoundException e) {
                e.printStackTrace();
            }
            return numbers;
        }

        public static Map<Integer, List<String>> createNewFiles() throws IOException {
            Map<Integer, List<String>> myMap = new HashMap<>();
            List<String> getList = GetNumbers.createList();
            List<String> list = null;
            int count = getList.size() / 1000;
---------------------------doubt full code-----------------------------------
            for (int i = 1; i <= count; i++) {
                if (getList.size() > 1000) {
                    list = getList.subList(i, i * 1000);
                } else if (getList.size() < 999) {
                    list = getList.subList(i, getList.size());
                }
-----------------------------------------------------------------------------
                myMap.put(i, list);
            }
            return myMap;

        }

        public static void getMap() throws IOException {
            Map<Integer, List<String>> map = GetNumbers.createNewFiles();
            List<String> listAtIndexOne = map.get(2);
            List<String> listAtIndexTwo = map.get(1);
            for (String elementFromFirstList : listAtIndexOne) {
                for (String elementFromSecondList : listAtIndexTwo) {
                    if (elementFromFirstList.equals(elementFromSecondList)) {
                        System.out.println("duplicate copy");
                    }
                }
            }

        }

        public static void main(String[] args) {
            try {
                GetNumbers.getMap();
            } catch (IOException e) {
                e.printStackTrace();
            }
        }
    }

EDIT

if I change my Code to

for (int i = 0; i <= count; i++) {
            if (getList.size() > (i * 1000)) {
                list = getList.subList(i, (i + 1) * 1000);
            } else if (getList.size() < 999) {
                list = getList.subList(i, getList.size());
            }
            myMap.put(i, list);
        }

I'm getting

Exception in thread "main" java.lang.IndexOutOfBoundsException: toIndex = 25000 at java.util.SubList.(Unknown Source) at java.util.AbstractList.subList(Unknown Source) at com.dnd.GetNumbers.createNewFiles(GetNumbers.java:43) at com.dnd.GetNumbers.getMap(GetNumbers.java:54) at com.dnd.GetNumbers.main(GetNumbers.java:69)

Any help is appreciated

Thanks

Upvotes: 2

Views: 96

Answers (2)

Peter Lawrey
Peter Lawrey

Reputation: 533442

There is quite a few things I would change but one bug is in this line

subList(i, i * 1000);

You are starting the list at 1 to 1000 first which ignores the value at 0 but on the second iteration you are doing 2 to 2000 etc.

Most likely what you intended was 0 to 999 and 1000 to 1999 after that. BTW Performing a subList on a LinkedList is pretty inefficient.

I would build these lists as you read the file.


I would write it like this

public static void splitFile(String inputFile, String outputTemplate, int count) throws IOException {
    int fileCount = 0, lineCount = 0;
    // check for duplicates.
    Set<String> previous = new HashSet<>();
    // file to write to
    PrintWriter pw = null;
    // file to read from
    try (BufferedReader in = new BufferedReader(new FileReader(inputFile))) {
        // while there is another line to read.
        for (String line; (line = in.readLine()) != null; ) {
            // skip duplicates.
            if (!previous.add(line))
                continue;
            // if we are at the end or haven't start a file.
            if (pw == null || lineCount++ >= count) {
                // close the old on if there was one.
                if (pw != null)
                    pw.close();
                // start a new file using the template i.e. where do we put the number.
                pw = new PrintWriter(String.format(outputTemplate, fileCount++));
                // we will have one line in this file.
                lineCount = 1;
            }
            // add the line.
            pw.println(line);
        }
    }
    // close the file if we had one left open.
    if (pw != null)
        pw.close();
}

public static void main(String[] args) throws IOException {
    // split the file into multiple files with up to 1000 lines each.
    splitFile("file.txt", "file-part-%n.txt", 1000);
}

Upvotes: 2

Eran
Eran

Reputation: 393771

To split the list into sub-lists of 1000 elements, you can write something like this :

        for (int i = 1; i <= count; i++) {
            if (getList.size() >= i*1000) {
                list = getList.subList((i-1) * 1000, i * 1000);
            } else {
                list = getList.subList((i-1) * 1000, getList.size());
            }
            myMap.put(i, list);
        }

or simpler :

        for (int i = 1; i <= count; i++) {
            list = getList.subList((i-1) * 1000, Math.min(getList.size(),i * 1000));
            myMap.put(i, list);
        }

Note that the indices are 0 based, so the first sub-list will be 0 to 999, the second 1000 to 1999 and so on.

Upvotes: 2

Related Questions