Reputation: 415
This method is supposed to take a FileInputStream and character by character, it is supposed to return a StringBuilder of all the 0's and 1's that lead to that character in this Huffman tree. However, I am having some trouble in which it just returns an instance of every single path.
For example, If I have the tree:
(10)
(4) (6)
(2=' ') (2) (3='a') (3='b')
(1=EOF) (1='c')
The file:
ab ab cab
Would return a lot more 1's and 0's than expected. I've tested my build tree methods and they seem to work. However, I assume something is wrong with my recursive method, compress(). I believe that it's because when it reaches a leaf that doesn't contain the character desired, it still returns a string of the path to that leaf still. Because of this, it will return a lot more than expected. If that's true, then how do I eliminate the path the that leaf if it doesn't match?
Edit: I've been working through this all weekend and here's what I have: The client code which contains a GUI is really long so I omitted it. I've also omitted the methods to print out the tree since there's already enough code here.
import java.io.*;
import java.util.*;
public class HuffmanTree {
public HuffmanNode overallRoot;
Map<Character, Integer> storage; // gets the repeating times of a number
ArrayList<HuffmanNode> nodes; // stores all nodes (will have only one node later)
// constructor
public HuffmanTree(Map<Character, Integer> counts) {
storage = counts; // sets the map to this one // putAll instead?
storage.put((char)4, 1); // put end of file character
storage = sortByValue(storage); // map is now sorted by values
nodes = storeNodes();
createTree();
}
// creates nodes from each key/value in storage map
private ArrayList<HuffmanNode> storeNodes() {
List<Character> characters = new ArrayList<Character>(storage.keySet());
ArrayList<HuffmanNode> output = new ArrayList<HuffmanNode>(); // stores all the nodes
for (Character i: characters) {
HuffmanNode temp = new HuffmanNode(storage.get(i), i);
output.add(temp);
}
return output; // output will be sorted by occurrences
}
// post: helper that sorts the map by value code
// Source: http://stackoverflow.com/questions/109383/how-to-sort-a-mapkey-value-on-the-values-in-java
private static <Character, Integer extends Comparable<? super Integer>> Map<Character, Integer>
sortByValue( Map<Character, Integer> map ) {
List<Map.Entry<Character, Integer>> list =
new LinkedList<Map.Entry<Character, Integer>>( map.entrySet() );
Collections.sort( list, new Comparator<Map.Entry<Character, Integer>>() {
public int compare( Map.Entry<Character, Integer> o1, Map.Entry<Character, Integer> o2 ) {
return (o1.getValue()).compareTo( o2.getValue() );
}
} );
Map<Character, Integer> result = new LinkedHashMap<Character, Integer>();
for (Map.Entry<Character, Integer> entry : list) {
result.put( entry.getKey(), entry.getValue() );
}
return result;
}
// takes stuff from nodes and creates relationships with them
private void createTree() {
do { // keep running until nodes has only one elem
HuffmanNode first = nodes.get(0); // gets the first two nodes
HuffmanNode second = nodes.get(1);
HuffmanNode combined;
combined = new HuffmanNode(first.frequency + second.frequency); // combined huffman node
combined.left = first;
combined.right = second;
nodes.remove(0); // then remove the first two elems from list
nodes.remove(0);
// goes through and adds combined into right spot
boolean addAtEnd = true;
for (int i = 0; i < nodes.size(); i++) {
if (nodes.get(i).frequency > combined.frequency) {
nodes.add(i, combined);
addAtEnd = false; // already added; don't add at end
break;
}
} // need to add at end
if (addAtEnd) {
nodes.add(combined);
}
if (nodes.size() == 1) {
break;
}
} while (nodes.size() > 1);
}
// inputFile is a textFile // puts contents of file onto display window
// nodes need to be made first
// This is responsible for outputting 0's and 1's
public StringBuilder compress(InputStream inputFile) throws IOException {
StringBuilder result = new StringBuilder(); // stores resulting 1's and 0's
byte[] fileContent = new byte[20000000]; // creates a byte[]
inputFile.read(fileContent); // reads the input into fileContent
String storage = new String(fileContent); // contains entire file into this string to process
// need to exclude default value
String storage2 = ""; // keeps chars of value without default values
for (int i = 0; i < storage.length(); i++) {
if (storage.charAt(i) != '\u0000') {
storage2+=storage.charAt(i);
} else {
break;
}
}
for (int i = 0; i < storage2.length(); i++) { // goes through every char to get path
String binary = findPath(storage2.charAt(i));
result.append(binary); // add path to StringBuilder
}
return result;
}
// return a stringbuilder of binary sequence by reading each character, searching the
// tree then returning the path of 0's and 1's
private String findPath(char input) {
return findPath(input, nodes.get(0), "");
}
private String findPath(char input, HuffmanNode root, String path) {
String result = "";
if (!root.isLeaf()) {
result += findPath(input, root.left, path += "0"); // go left
result += findPath(input, root.right, path += "1"); // go right
} if (root.isLeaf()) { // base case If at right leaf
if (input == root.character) {
//System.out.println("found it");
return path;
}
}
return result;
}
}
Here is the individual node class:
import java.io.*;
import java.util.*;
// Stores each character, its number of occurrences, and connects to other nodes
public class HuffmanNode implements Comparable<HuffmanNode>{
public int frequency;
public char character;
public HuffmanNode left;
public HuffmanNode right;
// constructor for leaf
public HuffmanNode(int frequency, char character) {
this.frequency = frequency;
this.character = character;
left = null;
right = null;
}
// constructor for node w/ children
public HuffmanNode(int frequency) {
this.frequency = frequency;
left = null;
right = null;
}
// provides a count of characters in an input file and place in map
public static Map<Character, Integer> getCounts(FileInputStream input) throws IOException {
Map<Character, Integer> output = new TreeMap<Character, Integer>(); // treemap keeps keys in sorted order (chars alphabetized)
byte[] fileContent = new byte[2000000]; // creates a byte[]
//ArrayList<Byte> test = new ArrayList<Byte>();
input.read(fileContent); // reads the input into fileContent
String test = new String(fileContent); // contains entire file into this string to process
//System.out.println(test);
// goes through each character of String to put chars as keys and occurrences as keys
int i = 0;
char temp = test.charAt(i);
while (temp != '\u0000') { // while does not equal default value
if (output.containsKey(temp)) { // seen this character before; increase count
int count = output.get(temp);
output.put(temp, count + 1);
//System.out.println("repeat; char is: " + temp + "count is: " + output.get(temp)); // test
} else { // Haven't seen this character before; create count of 1
output.put(temp, 1);
//System.out.println("new; char is: " + temp + "count is: 1"); // test
}
i++;
temp = test.charAt(i);
}
return output;
}
// sees if this node is a leaf
public boolean isLeaf() {
if (left == null && right == null) {
return true;
}
return false;
}
@Override
public int compareTo(HuffmanNode o) {
// TODO Auto-generated method stub
return 0;
}
}
Upvotes: 0
Views: 4505
Reputation: 440
I suggest you use polymorphism on the two different types of nodes you have. Something like the interpreter design pattern. It will help your code in both readability and efficiency:
abstract class HuffmanNode {
private String code;
public abstract int getFrequency();
public abstract String getCode();
public abstract void generateCodes(String code, Map<Character, String> codes);
...
}
class InternalNode {
private HuffmanNode left;
private HuffmanNode right;
...
public void generateCodes(String code, Map<Character, String> codes) {
left.generateCodes(code + "0", codes);
right.generateCodes(code + "1", codes);
}
...
}
class CharacterNode {
private int frequency;
private char character;
...
public void generateCodes(String code, Map<Character, String> codes) {
codes.put(character, code);
}
...
}
You can fill the codes
for the whole tree like this (run only once):
Map<Character, String> codes = new HashMap<>();
root.generateCodes("", codes);
and then use codes.get(input)
instead of findPath
.
You can have very clean recursive implementations for other methods such as getFrequency()
as well.
Upvotes: 0
Reputation: 148890
The method private String findPath(char input, HuffmanNode root, String path)
is the problem. It seems that you try to use a backtrack to search the tree, but you never unstack anything when going back.
A simple solution is to return null for a leaf that is not the correct character so as keep only the correct path :
private String findPath(char input, HuffmanNode root, String path) {
String result;
if (! root.isLeaf()) {
if ((result = findPath(input, root.left, path + '0')) == null) {
result = findPath(input, root.right, path + '1');
}
}
else {
result = (input == root.character) ? path : null;
}
return result;
}
Tested with your example, correctly gives findPath('a', root, "") = "10"
and findPath('c', root, "") = "011"
But you are doing a sequential search for each character read from input. IMHO, it is more efficient to first create a hash with each character as key and the path as value :
private Map<Character, String> genMap(HuffmanNode root) {
Map<Character, String> map = new HashMap<Character, String>();
huffmanTreeAdd(map, root, "");
return map;
}
private void huffmanTreeAdd(Map<Character, String> map, HuffmanNode root, String path) {
if (root.isLeaf()) {
map.put(root.character, path);
}
else {
huffmanTreeAdd(map, root.left, path + '0');
huffmanTreeAdd(map, root.right, path + '1');
}
}
That way you get directly the path for every character read on input with String path = map.get(c);
with a hash search instead of a sequential one.
Upvotes: 1