user7146946
user7146946

Reputation: 111

Unable to read csv file that was produced after FileWriter and find duplicate in Java

I am trying to read the 1st csv file that was created from the FileWriter.

The output of the 1st csv file is the content of entity names (column[1]) that occurs/appear more than 10 times.

After reading the 1st csv file, I am trying to check for duplicates of column[5] (i.e. Tweet token) and write, and add it into the 2nd csv file. I tried using the .contains method, it does not check the duplicates.

Update: I have successfully read the file but not able to remove duplicates in EventDetectionToken().

Here is the code:

import java.io.*;
import java.util.*;

public class EventDetectioncopy {
    public static void main(String[] args) throws FileNotFoundException, IOException{
        //1st csv file
        System.out.print("Enter a name for new Tweet Cluster sorting by name entity: ");
        BufferedReader scanName = new BufferedReader(new InputStreamReader(System.in));
        String newNamefile = scanName.readLine();

        //2nd csv file
        System.out.print("Enter a name for new Tweet Cluster sorting by tweet tokens: ");
        BufferedReader scanToken = new BufferedReader(new InputStreamReader(System.in));
        String newTokenfile = scanToken.readLine();

        try {
            eventDetectionName(newNamefile);
            eventDetectionToken(newNamefile, newTokenfile);

        }
        catch (FileNotFoundException e) {
            System.out.println(e);
        }
        catch (IOException e){

        }
    }

    public static void eventDetectionToken(String fileInput, String fileOuput) throws FileNotFoundException, IOException{
        FileWriter newCsv = new FileWriter(fileOutput + "." + "csv");
        BufferedWriter newCsvBW = new BufferedWriter(newCsv);
        BufferedReader reader = new BufferedReader(new FileReader(fileInput + ".csv"));
        String data;

        try{
            String temp = null;
            List<String> tempList = new ArrayList<String>();

            do
            {
                data = reader.readLine();
                String tweetToken = null;

                if(data != null)
                {
                    String[] splitText = data.split(",");
                    tweetToken = splitText[5];
                }

                if(temp != null)
                {
                    if(data == null || tweetToken.contains(tweetToken))
                    {
                        if(!(temp.equals(tweetToken)))
                        {
                            for (int i = 0; i < tempList.size(); i ++)
                            {
                                newCsvBW.append(tempList.get(i));
                                newCsvBW.append("\n");
                                System.out.println(tempList.get(i));
                            }
                         }

                        tempList.clear();
                        temp = tweetToken;
                    }
                }
                else
                {
                    temp = tweetToken;
                }
                tempList.add(data);
            }
            while(data != null);
        }
        finally
        {
            newCsvBW.close();
            reader.close();
        }
    }

    public static void eventDetectionName(String filename) throws FileNotFoundException, IOException{
        String csv = "1day/clusters.sortedby.clusterid.csv";
        FileWriter newCsv = new FileWriter(filename + "." + "csv");
        BufferedWriter newCsvBW = new BufferedWriter(newCsv);
        BufferedReader reader = new BufferedReader(new FileReader(csv));
        String data;

        try{
            String temp = null;
            List<String> tempList = new ArrayList<String>();
            List<Long> tempTime = new ArrayList<Long>();
            do 
            {
                data = reader.readLine();
                String nameEntity = null;
                if (data != null) 
                {
                    String[] splitText = data.split(",");
                    nameEntity = splitText[1];
                }
                if (temp != null) 
                {
                    if (data == null || !(nameEntity.equals(temp))) 
                    {
                        if (tempList.size() >= 10) 
                        {
                            for (int i = 0; i < tempList.size(); i++) 
                            {
                                newCsvBW.append(tempList.get(i));
                                newCsvBW.append("\n");
                                System.out.println(tempList.get(i));
                            }
                        }
                        tempList.clear();
                        temp = nameEntity;
                    }
                } 
                else 
                {
                    temp = nameEntity;
                }
                tempList.add(data);
            } 
            while (data != null);
        }
        finally
        {
            reader.close();
            newCsvBW.close();
        }

    }
}

Below is some content of the original csv file: "clusters.sortedby.clusterid.csv", before running EventDetectioncopy.java with duplicate tweet tokens (column[5]): [clusterid], [name entity], [tweetid], [timestamp], [userid], [tweet token], [tweet text]

1   rick ross   2.5582E+17  1.34983E+12 389746870   rick ross dice pineappl Rick Ross x diced pineapples
1   rick ross   2.5582E+17  1.34983E+12 56082039    dice pineappl uhhh rick ross voic   Diced Pineapples. UHHH *Rick Ross voice*
1   rick ross   2.55821E+17 1.34983E+12 870278689   rick ross trend Why is Rick Ross trending?
1   rick ross   2.55822E+17 1.34983E+12 379948188   lmfao rick ross grunt   Lmfao he did that rick ross grunt .
1   rick ross   2.55822E+17 1.34983E+12 276594374   play rick ross  they played w| rick ross !
1   rick ross   2.55822E+17 1.34983E+12 386219877   rick ross ugli  Rick Ross So Ugly ..
1   rick ross   2.55822E+17 1.34983E+12 53327754    wanna play rick ross belli  I Wanna Play in Rick Ross Belly..!
1   rick ross   2.55824E+17 1.34983E+12 19690034    rick ross dice pineappl ft wale amp drake video via laleak  Rick Ross - Diced Pineapples ft. Wale &amp; Drake (Video) via @laleakers
1   rick ross   2.55825E+17 1.34983E+12 357250991   husband rick ross   where my husband rick ross î„…î‰
1   rick ross   2.55825E+17 1.34983E+12 53734179    throw rick ross kirko bangz *Throws Rick ross At Kirko Bangz*
1   rick ross   2.55825E+17 1.34983E+12 462179553   rick ross stay fresh    Rick Ross Stay Fresh!!!!
1   rick ross   2.55827E+17 1.34983E+12 46744853    offici music video dice pineappl rick ross drake wale   Official Music Video " Diced Pineapples" Rick Ross / Drake / Wale
1   rick ross   2.55829E+17 1.34983E+12 461725574   saw rick ross uhhh ifxckgaygirl dadd    i saw rick ross their .. uhhh @ifxckgaygirls dadd :p
1   rick ross   2.55832E+17 1.34983E+12 283244204   rick ross wavi fat guy  Rick Ross is a wavy fat guy
1   rick ross   2.55832E+17 1.34983E+12 528834435   rick ross dice pineappl Rick Ross - Diced Pineapples
1   rick ross   2.55835E+17 1.34983E+12 463279022   rick ross featur wale amp drake dice pineappl ricki ross experi downtim less 24 hour    Rick Ross featuring Wale &amp; Drake – Diced Pineapples:   Ricky Ross experiences no downtime as less than 24 hours ...
1   rick ross   2.55835E+17 1.34983E+12 28460245    yuck lalasodiddi need husband rick ross take award home hiphiopaward    YUCK! RT @LalaSoDiddy: I need my husband Rick Ross to take some awards home #HipHiopAwards
1   rick ross   2.55836E+17 1.34983E+12 330811468   kingkennzi rick ross round  “@KingKennzie: Rick Ross is very round.†ðŸ
1   rick ross   2.55836E+17 1.34983E+12 124024753   rick ross titti Rick Ross Titties!
1   rick ross   2.55836E+17 1.34983E+12 765822380   rick ross titti tho Rick Ross and them titties tho!!!
2   tyler oakley    2.55821E+17 1.34983E+12 867420925   know someth trend new asktyl tyleroakley live   HOW DO YOU KNOW WHEN SOMETHING IS TRENDING? IM NEW TO THIS... #aSKTYLER
2   tyler oakley    2.55822E+17 1.34983E+12 504044044   asktyl get perfect quiff tyleroakley live   #AskTyler How do you get a perfect quiff :)?
2   tyler oakley    2.55822E+17 1.34983E+12 709347721   asktyl realli homework right now tyleroakley live   #asktyler i really should be doing homework right now
2   tyler oakley    2.55822E+17 1.34983E+12 171667747   obsess right now asktyl tyleroakley live    what is your obsession right now? #asktyler
3   wiz khalifa 2.5582E+17  1.34983E+12 588829718   dont like wiz khalifa look sexi I don't like Wiz Khalifa but he looks sexy.
3   wiz khalifa 2.55856E+17 1.34984E+12 502086440   feel like wiz khalifa right now I feel like wiz Khalifa right now..
3   wiz khalifa 2.55866E+17 1.34984E+12 446056049   like wiz khalifa hes ador realli look like hot cheeto man thingi    I like Wiz Khalifa he's adorable, but he really do look like the hot cheeto man thingy
3   wiz khalifa 2.55883E+17 1.34984E+12 67747115    np ne yo ft wiz khalifa dont make em like   #Np Ne-Yo ft. Wiz Khalifa - They don't make em like you 

Update: How can I remove the duplicates of it?

Upvotes: 0

Views: 273

Answers (3)

SerhiiK
SerhiiK

Reputation: 861

EDITED: it will delete all duplicates and leave only one item.

public static void eventDetectionToken(String fileInput, String fileOuput)
        throws FileNotFoundException, IOException {
    FileWriter newCsv = new FileWriter(fileOuput + "." + "csv");
    BufferedWriter newCsvBW = new BufferedWriter(newCsv);
    BufferedReader reader = new BufferedReader(new FileReader(fileInput + ".csv"));
    String data;

    try {
        List<String> existanceTokens = new ArrayList<String>();

        do {
            data = reader.readLine();
            String tweetToken = null;

            if (data != null) {
                String[] splitText = data.split(",");
                tweetToken = splitText[5];

                if (!(existanceTokens.contains(tweetToken))) {
                    newCsvBW.append(data);
                    newCsvBW.append("\n");
                    existanceTokens.add(tweetToken);
                }
            }
        } while (data != null);
    } finally {
        newCsvBW.close();
        reader.close();
    }
}

But if you want firstly create CSV file with duplicates by [name entity] and than based on this file create second one with duplicates by [tweet token], need to change inputCSV to newNamefile for second eventDetection invokation like this:

    eventDetection(inputCSV, newNamefile, 1);
    eventDetection(newNamefile, newTokenfile, 5);

Hope it helps.

Upvotes: 1

phuong_buon
phuong_buon

Reputation: 89

        String csvFile = csvFilePath1;
        BufferedReader br = null;
        BufferedReader br1 = null;
        String line = "";
        String csv = csvFilePath;
        FileWriter fileWriter = null;
        try {
            fileWriter = new FileWriter(csv);
        } catch (IOException e) {
            e.printStackTrace();
        }
        HashSet<String> lines = new HashSet<>();
        try {
            br = new BufferedReader(new FileReader(csvFile));
            br1 = new BufferedReader(new FileReader(csvFilePath1));
            int headerRow = 10;
            for (int i = 0; i <= headerRow; i++) {
                fileWriter.append(br1.readLine() + "\n");
            }
            br1.close();
            while ((line = br.readLine()) != null) {
                if (lines.add(line) && lines.size() >= 5) {
                    fileWriter.append(line);
                    fileWriter.append("\n");
                }
            }
            fileWriter.flush();
            fileWriter.close();
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            if (br != null) {
                try {
                    br.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }
            }
        }

Upvotes: 0

Sync
Sync

Reputation: 3797

Why is FileReader unable to read newNamefile?

That is because the variable newNamefile in

BufferedReader reader = new BufferedReader(new FileReader(newNamefile));

does not exist in the scope of EventDetectioncopy#eventDetectionToken.

Proposed solution

Change the variable to match the parameter in the method:

BufferedReader reader = new BufferedReader(new FileReader(filename));

Upvotes: 0

Related Questions