Mohsin Rasheed
Mohsin Rasheed

Reputation: 48

Saving the Tokens into ArraysList and compare each in Java

i am working on plagiarism detection system for that i need to compare two strings and show the similarity result.

I have two strings i have converted them into tokens separated with the spaces, now i want to save them in ArrayList, so that i can compare them and show the the result of same index in sequence.

My Source code is

public static void main(String[] args) {
    // TODO code application logic here

    String str1 = "This is String number one";
    String str2 = "This is String number two";

    StringTokenizer st1 = new StringTokenizer(str1);
    StringTokenizer st2 = new StringTokenizer(str2);

    System.out.println("---Split by space---");
    ArrayList<String> list1 = new ArrayList<String>();

    list1.add(str1);

// was trying to save the tokens in arraylist...

    ArrayList<String> list2 = new ArrayList<String>();
    list2.add(str2);

    for (String number : list1) {
System.out.println("String 1 = " + number);
}
    for (String number : list2) {
System.out.println("String 2 = " + number);
}

}
}

Any Suggestions/Examples would be helpful.

Upvotes: 1

Views: 2761

Answers (6)

Jishnu Prathap
Jishnu Prathap

Reputation: 2033

Your code is not using the string tokenizers st1 and st2 , you are adding Strings str1,str2 to your arraylistlist.I am not sure what u r trying to achieve with the code but from ur comment " // was trying to save the tokens in arraylist..." and then adding the string to list instead of all items in the string tokenizer variable

Change this part of your code

// was trying to save the tokens in arraylist...

    ArrayList<String> list2 = new ArrayList<String>();
    list2.add(str2);

to

 // was trying to save the tokens in arraylist...  
  ArrayList<String> list2 = new ArrayList<String>();
     while((st2.hasMoreTokens()))//you need to iterate over the string tokens
        list2.add(str2);

Upvotes: 1

Terrible Tadpole
Terrible Tadpole

Reputation: 634

This does everything you seem to be asking for, and it also copes with lists of different lengths:

public class StringTokenCompare {

    void compareStringTokens (String s1, String s2) {
        List <String> l1 = Arrays.asList (s1.split (" "));
        List <String> l2 = Arrays.asList (s2.split (" "));
        Iterator <String> i1 = l1.iterator();
        Iterator <String> i2 = l2.iterator();
        int totalItems = Math.abs (l1.size () - l2.size ());
        int matchCount = 0;
        while (i1.hasNext() && i2.hasNext()) {
            String t1 = i1.next();
            String t2 = i2.next();
            if (t1.equals(t2)) {
                ++matchCount;
            }
        }
        System.out.format ("Tokens in longer line: %d", totalItems);
        System.out.format ("Matching tokens:       %d", matchCount);
    }

}

BUT, the fact that the lists may be of different sizes should start you thinking about issues that you have to cope with if you're serious about detecting plagiarism.

  1. What if a word has been inserted or deleted so it shifts the words in one of the lists? You'll get a low match count on very similar lines.
  2. What if the word order has been rearranged?

My suggestion - beyond the scope of the original question of course - is that you should seriously consider edit distance between the token lists rather than a naive item-by-item comparison. An Internet search will quickly locate a simple edit distance algorithm.

Upvotes: 0

Shekhar Khairnar
Shekhar Khairnar

Reputation: 2691

This code snippet will help you:--

List<String> repetWords = new ArrayList<String>(); 
    String str1 = "This is String number one";
    String str2 = "This is String number two";

    String[] array = str1.split(" ");
    List<String> list = new ArrayList<String>(array.length);
    Collections.addAll(list, array);

    String[] array2 = str2.split(" ");
    List<String> list2 = new ArrayList<String>(array2.length);
    Collections.addAll(list2, array2);

    for (String string : list) {
        if(list2.indexOf(string) != -1){
            repetWords.add(string);
        }
    }
    System.out.println("repeated words in str2");

    for (String rptWords : repetWords) {
        System.out.println(rptWords);
    }
}

Upvotes: 1

DeiAndrei
DeiAndrei

Reputation: 947

If you want to add the tokens to your list, you have to iterate over them, not simply add the StringTokenizer to your list.

For example:

public static void main(String[] args) {

    String str1 = "This is String number one";
    StringTokenizer st1 = new StringTokenizer(str1);
    ArrayList<String> list1 = new ArrayList<String>();

    //Iterate over all tokens and add them to your list
    while (st1.hasMoreTokens()) {
        list1.add(st1.nextToken());
    }

    System.out.println("List 1 tokens: ");
    for (String element : list1) {
        System.out.println("\t" + element);
    }

    System.out.println("There are " + list1.size() + " tokens");
}

The output is:

List 1 tokens: 
    This
    is
    String
    number
    one
There are 5 tokens

Upvotes: 1

Prashant
Prashant

Reputation: 4624

This should work :

for (String token1 : list1) {
            for (String token2 : list2) {
                // code to compare two tokens
            }
        }

Also if you want to compare same index elements then both the lists should have same size :

for (int index = 0; index <list1.size(); index++) {
            String token1 = list1.get(index);
            String token2 = list2.get(index);
            // code to compare tokens
        }

Upvotes: 0

griFlo
griFlo

Reputation: 2164

Do you need the StringTokenizer?

 String str1 = "This is String number one";
 String str2 = "This is String number two";

 List<String> list1 = Arrays.asList(str1.split(" "));
 List<String> list2 = Arrays.asList(str2.split(" "));

Upvotes: 1

Related Questions