Reputation: 48
i am working on plagiarism detection system for that i need to compare two strings and show the similarity result.
I have two strings i have converted them into tokens separated with the spaces, now i want to save them in ArrayList, so that i can compare them and show the the result of same index in sequence.
My Source code is
public static void main(String[] args) {
// TODO code application logic here
String str1 = "This is String number one";
String str2 = "This is String number two";
StringTokenizer st1 = new StringTokenizer(str1);
StringTokenizer st2 = new StringTokenizer(str2);
System.out.println("---Split by space---");
ArrayList<String> list1 = new ArrayList<String>();
list1.add(str1);
// was trying to save the tokens in arraylist...
ArrayList<String> list2 = new ArrayList<String>();
list2.add(str2);
for (String number : list1) {
System.out.println("String 1 = " + number);
}
for (String number : list2) {
System.out.println("String 2 = " + number);
}
}
}
Any Suggestions/Examples would be helpful.
Upvotes: 1
Views: 2761
Reputation: 2033
Your code is not using the string tokenizers st1 and st2 , you are adding Strings str1,str2 to your arraylistlist.I am not sure what u r trying to achieve with the code but from ur comment " // was trying to save the tokens in arraylist..." and then adding the string to list instead of all items in the string tokenizer variable
Change this part of your code
// was trying to save the tokens in arraylist...
ArrayList<String> list2 = new ArrayList<String>();
list2.add(str2);
to
// was trying to save the tokens in arraylist...
ArrayList<String> list2 = new ArrayList<String>();
while((st2.hasMoreTokens()))//you need to iterate over the string tokens
list2.add(str2);
Upvotes: 1
Reputation: 634
This does everything you seem to be asking for, and it also copes with lists of different lengths:
public class StringTokenCompare {
void compareStringTokens (String s1, String s2) {
List <String> l1 = Arrays.asList (s1.split (" "));
List <String> l2 = Arrays.asList (s2.split (" "));
Iterator <String> i1 = l1.iterator();
Iterator <String> i2 = l2.iterator();
int totalItems = Math.abs (l1.size () - l2.size ());
int matchCount = 0;
while (i1.hasNext() && i2.hasNext()) {
String t1 = i1.next();
String t2 = i2.next();
if (t1.equals(t2)) {
++matchCount;
}
}
System.out.format ("Tokens in longer line: %d", totalItems);
System.out.format ("Matching tokens: %d", matchCount);
}
}
BUT, the fact that the lists may be of different sizes should start you thinking about issues that you have to cope with if you're serious about detecting plagiarism.
My suggestion - beyond the scope of the original question of course - is that you should seriously consider edit distance between the token lists rather than a naive item-by-item comparison. An Internet search will quickly locate a simple edit distance algorithm.
Upvotes: 0
Reputation: 2691
This code snippet will help you:--
List<String> repetWords = new ArrayList<String>();
String str1 = "This is String number one";
String str2 = "This is String number two";
String[] array = str1.split(" ");
List<String> list = new ArrayList<String>(array.length);
Collections.addAll(list, array);
String[] array2 = str2.split(" ");
List<String> list2 = new ArrayList<String>(array2.length);
Collections.addAll(list2, array2);
for (String string : list) {
if(list2.indexOf(string) != -1){
repetWords.add(string);
}
}
System.out.println("repeated words in str2");
for (String rptWords : repetWords) {
System.out.println(rptWords);
}
}
Upvotes: 1
Reputation: 947
If you want to add the tokens to your list, you have to iterate over them, not simply add the StringTokenizer to your list.
For example:
public static void main(String[] args) {
String str1 = "This is String number one";
StringTokenizer st1 = new StringTokenizer(str1);
ArrayList<String> list1 = new ArrayList<String>();
//Iterate over all tokens and add them to your list
while (st1.hasMoreTokens()) {
list1.add(st1.nextToken());
}
System.out.println("List 1 tokens: ");
for (String element : list1) {
System.out.println("\t" + element);
}
System.out.println("There are " + list1.size() + " tokens");
}
The output is:
List 1 tokens:
This
is
String
number
one
There are 5 tokens
Upvotes: 1
Reputation: 4624
This should work :
for (String token1 : list1) {
for (String token2 : list2) {
// code to compare two tokens
}
}
Also if you want to compare same index elements then both the lists should have same size :
for (int index = 0; index <list1.size(); index++) {
String token1 = list1.get(index);
String token2 = list2.get(index);
// code to compare tokens
}
Upvotes: 0
Reputation: 2164
Do you need the StringTokenizer?
String str1 = "This is String number one";
String str2 = "This is String number two";
List<String> list1 = Arrays.asList(str1.split(" "));
List<String> list2 = Arrays.asList(str2.split(" "));
Upvotes: 1