Reputation: 2237
I have a text file (T1.txt) in which it has few strings.out of them 2 are similar but case-sensitive. I have to ignore the other one and get the rest of them..
e.g.. ABCD, XYZ, pqrs, aBCd.
i am using Set to return the strings.. but how I can ignore the duplicate and return only one string( either of ABCD , aBCd).
public static Set findDuplicates(File inputFile)
{
FileInputStream fis = null;
BufferedInputStream bis = null;
DataInputStream dis = null;
Set<String> set = new HashSet<String>();
ArrayList<String> inpArrayList = new ArrayList<String>();
try{
fis = new FileInputStream(inputFile);
bis = new BufferedInputStream(fis);
dis = new DataInputStream(bis);
while (dis.available() != 0)
{
inpArrayList.add(dis.readLine());
}
for(int i=0; i < inpArrayList.size(); i++)
{
if(!set.contains(inpArrayList.get(i)))
set.add(inpArrayList.get(i));
}
}
catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
System.out.println(" set" + set);
return set;
}
The returning set shall contain only XYZ, pqrs, aBCd or ABCD. but not both.
Thanks Ramm
Upvotes: 0
Views: 276
Reputation: 3963
You could use a TreeSet
and the String.CASE_INSENSITIVE_ORDER
comparator, which I find more elegant than the suggested HashMap solutions:
Set<String> set = new TreeSet<String>(String.CASE_INSENSITIVE_ORDER);
set.add("abc");
set.add("AbC");
set.add("aBc");
set.add("DEF");
System.out.println(set); // => "[abc, DEF]"
Note that iteration through this set would give you the keys in lexicographical order. If you want to preserve the insertion order as well, I'd maintain a List on the side like this:
Set<String> set = new TreeSet<String>(String.CASE_INSENSITIVE_ORDER);
List<String> inOrder = new ArrayList<String>();
// when adding stuff inside your loop:
if (set.add(someString)) { // returns true if it was added to the set
inOrder.add(someString);
}
Upvotes: 2
Reputation: 14045
If the case of the output is not important you could use a custom FilterInputStream to do the conversion.
bis = new BufferedInputStream(fis);
fltis = new LowerCaseInputStream(bis);
dis = new DataInputStream(fltis);
An example of LowerCaseInputStream comes from here.
Upvotes: 0
Reputation: 2218
How about using HashMap (HashMap), with key being generated by a your hash function. The hash function would return the string in lowercase.
Shash
Upvotes: 0
Reputation: 3527
Just as said above, I did something similar earlier this week. You can do something like (just adjust it to your code):
HashMap<String, String> set = new HashMap<String, String>();
while(tokenzier.hasMoreTokens())
{
String element = tokenzier.nextToken();
String lowerCaseElement = element.toLowerCase();
if (!set.containsKey(element)
{
set.put(lowerCaseElement, element);
}
}
At the end the map 'set' will contain what you need.
Upvotes: 0
Reputation: 13501
inpArrayList.add(dis.readLine().toLowerCase());
adding this line should work...
Upvotes: 1
Reputation: 3111
Create a hash-map, use currentString.toLowerCase() as key, and original string as value. So that two string with different case will have the same key. When storing it, you use the original string as value, so when printing you won't get all lower-case but one of the original.
Upvotes: 2
Reputation: 9111
Just store your strings in upcase in your set, before storing them in your ArrayList result.
If you can't add a string to the set (because it already exists), don't store it in the ArrayList.
Upvotes: 0
Reputation: 14336
Convert every string to lowercase before inserting it into the set, and then the set will take care of the uniqueness for you.
(If you also need to preserve the case of the input (returning abcd for AbCd is not acceptable), then you need a second set that stores lower-case variants and use checks on the second set to decide whether or not to add strings to the result set. Same principle, but one more step to program.)
Upvotes: 0
Reputation: 44801
You can use the old trick of calling .toLower() before putting it in the set.
And if you want to keep the original case change to a hashmap from the lower case to the natural case then iterate the values.
Upvotes: 0