user2569824
user2569824

Reputation: 3

How to ignore duplicate strings when using RegEx to match string?

EDIT: editted for clarity as to what I'm having trouble with. I'm not getting the right responses as its counting dupes. I HAVE to use RegEx, can use tokenizer however but I did not.

What I am trying to do here is, there is 5 input files. I need to calculate how many "USER DEFINED VARIABLES" there are. Please ignore the messy code, I'm just learning Java.

I replaced: everything within ( and ), all non-word characters, any statements such as int, main etc, any digit with a space infront of it, and any blank space with a new line then trim it.

This leaves me with a list that has a variety of strings which I will match with my RegEx. However, at this point, how make my count only include unique identifiers?


EXAMPLE: For example, in the input file I have attached beneath the code, I am receiving "distinct/unique identifiers: 10" in my output file, when it should be "distinct/unique identifiers: 3"

And for example, in the 5th input file I have attached, I should have "distinct/unique identifiers: 3" instead I currently have "distinct/unique identifiers: 6"

I cannot use Set, Map etc.

Any help is great! Thanks.

import java.util.*
import java.util.regex.*;
import java.io.*;

public class A1_123456789 {

public static void main(String[] args) throws IOException {
    if (args.length < 1) {
        System.out.println("Wrong number of arguments");
        System.exit(1);
    }

    for (int i = 0; i < args.length; i++) {

        FileReader jk = new FileReader(args[i]);
        BufferedReader ij = new BufferedReader(jk);
        FileWriter fw = null;
        BufferedWriter bw = null;

        String regex = "\\b(\\w+)(\\s+\\1\\b)+";

        Pattern p = Pattern.compile("[_a-zA-Z][_a-zA-Z0-9]{0,30}");

        String line;
        int count = 0;

        while ((line = ij.readLine()) != null) {
           line = line.replaceAll("\\(([^\\)]+)\\)", " " );
           line = line.replaceAll("[^\\w]", " ");
           line = line.replaceAll("\\bint\\b|\\breturn\\b|\\bmain\\b|\\bprintf\\b|\\bif\\b|\\belse\\b|\\bwhile\\b", " ");
           line = line.replaceAll(" \\d", "");
           line = line.replaceAll(" ", "\n");
           line = line.trim();

            Matcher m = p.matcher(line);

            while (m.find()) {
                count++;
            }
        }

        try {
            String s1 = args[i];
            String s2 = s1.replaceAll("input","output");
            fw = new FileWriter(s2);
            bw = new BufferedWriter(fw);
            bw.write("distinct/unique identifiers: " + count);

        } catch (IOException e) {
            e.printStackTrace();
        } finally {
            try {
                if (bw != null) {
                    bw.close();
                }

                if (fw != null) {
                    bw.close();
                }

            } catch (IOException ex) {
                ex.printStackTrace();
            }
        }
    }
}

//This is the 3rd input file below.

int celTofah(int cel)
{
    int fah;
    fah = 1.8*cel+32;
    return fah;
}

int main()
{
    int cel, fah;
    cel = 25;
    fah = celTofah(cel);
    printf("Fah: %d", fah);
    return 0;
}

//This is the 5th input file below.

int func2(int i)
{
    while(i<10)
    {
        printf("%d\t%d\n", i, i*i);
        i++;
    }
}

int func1()
{
    int i = 0;
    func2(i);
}

int main()
{
    func1();
    return 0;
}

Upvotes: 0

Views: 211

Answers (2)

Ken Y-N
Ken Y-N

Reputation: 15009

Amal Dev has suggested a correct implementation, but given the OP wants to keep Matcher, we have:

// Previous code to here

// Linked list of unique entries
LinkedList uniqueMatches = new LinkedList();

// Existing code
while ((line = ij.readLine()) != null) {
    line = line.replaceAll("\\(([^\\)]+)\\)", " " );
    line = line.replaceAll("[^\\w]", " ");
    line = line.replaceAll("\\bint\\b|\\breturn\\b|\\bmain\\b|\\bprintf\\b|\\bif\\b|\\belse\\b|\\bwhile\\b", " ");
    line = line.replaceAll(" \\d", "");
    line = line.replaceAll(" ", "\n");
    line = line.trim();

    Matcher m = p.matcher(line);

    while (m.find()) {
        // New code - get this match
        String thisMatch = m.group();
        // If we haven't seen this string before, add it to the list
        if(!uniqueMatches.contains(thisMatch))
            uniqueMatches.add(thisMatch);
    }
}

// Now see how many unique strings we have collected
count = uniqueMatches.size();

Note I haven't compiled this, but hopefully it works as is...

Upvotes: 0

Amal Dev
Amal Dev

Reputation: 84

Try this

 LinkedList dtaa = new LinkedList();
        String[] parts =line.split(" ");
        for(int ii =0;ii<parts.length;ii++){
            if(ii == 0)
                dtaa.add(parts[ii]);
            else{
                if(dtaa.contains(parts[ii]))
                        continue;
                else
                    dtaa.add(parts[ii]);

            }
        }

       count = dtaa.size();

instead of

 Matcher m = p.matcher(line);

        while (m.find()) {
            count++;
        }

Upvotes: 1

Related Questions