Lewis
Lewis

Reputation: 466

Read and discard data CSV

I have one csv with one row who have diferent users (users.csv), in the other hand I also have a csv with users (users2.csv).. The problem is that I want to "compare?" these two documents and discard users from users2.csv to users1.csv if they exist in this file. Please ideas or advice, how could I do it??

Upvotes: 1

Views: 77

Answers (3)

Mak
Mak

Reputation: 1078

Best way I see,

1) Read both the files using Java NIO Api (That's actually very fast)separately and store them into list.

    Path path = Paths.get("src/main/resources/shakespeare.txt");
    try {

      Files.lines(path).forEach(System.out::println);//print each line

    } catch (IOException ex) {
      ex.printStackTrace();//handle exception here
    }

2) Compare both list using java 8 predictor.

    public static List < String > filterAndGetEmployees(List < String> employees,
        Predicate < String > predicate) {
        return list.stream().filter(predicate).collect(Collectors. < String > toList());
    }

3) If you wish to write file again , You can go like,

    Path path = Paths.get("src/main/resources/shakespeare.txt");
    try(BufferedWriter writer = Files.newBufferedWriter(path, Charset.forName("UTF-8"))){
            writer.write("To be, or not to be. That is the question.");
    }catch(IOException ex){
            ex.printStackTrace();
    }

Hope this will help you..

Upvotes: 1

Conffusion
Conffusion

Reputation: 4465

  • Load the first file into a List<String> users.
  • Load the second file into a List<String> users2.
  • use apache commons-collections CollectionUtils.removeAll(Collection<E> users, Collection<?> users2)

To load a file in a list you can find inspiration here.

Et voilà.

This only works if the size of the files is acceptable to load in memory. Otherwise it requires another approach like sorting both files using command line sort commands and walk through both files reading line by line and decide to write to output or not.

Upvotes: 2

Bala Ji
Bala Ji

Reputation: 51

You can use BeyondCompare to compare the two csvs. It will distinctively identify the missing user along with other data mismatch if any. In case if you want to do it programatically, you can create a user bean (and override equals method to compare username or any other you want) after copying csv into list/map of beans.

Upvotes: 1

Related Questions