Reputation: 743
I am working on analyzing the time complexity for a solution I wrote to the Leetcode question, Vertical Order Traversal of a Binary Tree
The question provides a binary tree and asks for a list of lists back, where each list corresponds to a vertical column within the tree, and each list is sorted from top down, with values in the same row and column being sorted numerically.
My code is shown below, but to summarize it, I perform a breadth first search level-by-level using two queues. I similarly use two maps. One map holds the relationship between columns and values in that column for the entire table, and the other maps serves a similar purpose, but it only holds the values for a single level at a time. This allows us to sort potential ties within each level, and then add its values to the aforementioned map. At the end of the program, the map with information across all levels is converted to a list of lists for the Leetcode grader.
I am having trouble analyzing the time complexity of the sorting logic. The worst case scenario is when there is the most sorting which is in a complete binary tree. Here is what I have been thinking so far:
For this complete binary tree, the nodes have been labeled with their respective columns, and at the top, I show each column's frequency at each level. I have this data on the image as Stack Overflow was not properly rendering the tables after submitting this post.
The magnitude of the values for the column are not particularly important, but the pattern that emerges for frequency is that of Pascal's Triangle / Binomial Coefficient Expansion. This makes sense in hindsight from adding/subtracting 1 from the parent node's column for the right/left children. The frequency of the column number indicates the size of the list at that level that would have to be sorted (the bottleneck).
Given this, I can make the recurrence
L in the recurrence refers to the level in the binary tree. The total work up to each level is the amount of work done for the previous levels, plus the work to sort the lists with variable size at this level (the size is variable with respect to binomial expansion coefficients). From here, I am not sure where to go. L, or logN for N total elements in the tree.
Intuition leads me to believe that this recurrence yields the following expression
but this still is not very clear, and I am also not sure if it is even right (like I said, intuition---not a proof---led me to this). Does this seem right? Is there a way to simplify this that I do not yet see?
I have attached the Java code below that implements this. Thank you, and please let me know if you have any questions.
* Definition for a binary tree node.
* public class TreeNode {
* int val;
* TreeNode left;
* TreeNode right;
* TreeNode() {}
* TreeNode(int val) { this.val = val; }
* TreeNode(int val, TreeNode left, TreeNode right) {
* this.val = val;
* this.left = left;
* this.right = right;
* }
* }
*
*/
class Solution {
/*
* Map<TreeNode, Integer> to keep track of column for each treenode
* Map<Integer, List<Integer>> to keep track of columns and the values in the column
* Map<Integer, List<Integer>> to keep track of columns and the values in the column
FOR EACH ROW
* Breadth first search
* - as adding to queue, add to column map
* - use a 2 queue approach for each level in breadth first search
* - when each level of BFS is finished, sort third map, then add all to second map
and clear it
*/
public List<List<Integer>> verticalTraversal(TreeNode root) {
Map<TreeNode, Integer> columns = new HashMap<>();
final Map<Integer, List<Integer>> table = new HashMap<>();
Map<Integer, List<Integer>> levelTable;
Queue<TreeNode> bfsQueue;
Queue<TreeNode> nextLevel = new ArrayDeque<>();
int minColOverall = Integer.MAX_VALUE;
nextLevel.offer(root);
columns.put(root, 0);
while(!nextLevel.isEmpty()){
bfsQueue = nextLevel;
nextLevel = new ArrayDeque<>();
levelTable = new HashMap<>();
int minCol = Integer.MAX_VALUE;
while(!bfsQueue.isEmpty()){ // Evaluate each level individually
TreeNode node = bfsQueue.poll();
int column = columns.remove(node);
minCol = Math.min(minCol,column);
if(!levelTable.containsKey(column))
levelTable.put(column, new ArrayList<Integer>());
levelTable.get(column).add(node.val);
if(node.left != null){
nextLevel.offer(node.left);
columns.put(node.left, column-1);
}
if(node.right != null){
nextLevel.offer(node.right);
columns.put(node.right, column+1);
}
}
minColOverall = Math.min(minCol, minColOverall);
for(int m = minCol; !levelTable.isEmpty(); m+=2){ // After level, add sorted col vals to final table
if(!levelTable.containsKey(m))
continue;
List<Integer> level = levelTable.remove(m);
Collections.sort(level);
if(!table.containsKey(m))
table.put(m, new ArrayList<Integer>());
table.get(m).addAll(level);
}
}
// Transform map to list
final List<List<Integer>> answer = new ArrayList<>();
while(!table.isEmpty())
answer.add(table.remove(minColOverall++));
return answer;
}
}```
Upvotes: 0
Views: 304
Reputation: 46960
That's really nice. This is an interesting problem.
I'll just note you can get a tighter coding with sorted data structures. Of course the tree maps are a bit slower.
class Solution {
private final TreeMap<Integer, TreeMap<Integer, List<Integer>>> accumulator =
new TreeMap<>();
private void search(TreeNode node, int i, int j) {
if (node == null) return;
accumulator.computeIfAbsent(j, k -> new TreeMap<>())
.computeIfAbsent(i, k -> new ArrayList<>())
.add(node.val);
search(node.left, i + 1, j - 1);
search(node.right, i + 1, j + 1);
}
public List<List<Integer>> verticalTraversal(TreeNode root) {
search(root, 0, 0);
return accumulator.values().stream().map(colMap -> {
colMap.values().forEach(Collections::sort);
return colMap.values().stream().flatMap(List::stream).collect(toList());
}).collect(toList());
}
}
Upvotes: 1
Reputation: 743
I believe I have come up with an explanation for an upper bound.
The time complexity is that of the traversal added with the time complexity of the sorting operations. The time complexity of the sorting will dominate that of the traversal, so that is what I will focus on.
The key to this explanation is that although the size of each level's column size is variable with respect to the Binomial Coefficient Expansion, this can be ignored if one uses the average input size instead.
Let the time complexity be bounded by:
O(number of total sorts * average sort complexity)
We sort for each column in each row, so the total amount of sorts is the sum across all rows of the amount of columns in that row. There are O(log N) rows in a binary tree consisting of N nodes. As seen in the above diagram, the amount of unique columns in each row grows linearly by 1. The first row has 1 unique column, the second row 2 unique columns, the third row 3 unique columns ... the final row, the row number logN, will have logN unique columns. This is a simple linear sum of integers from 1 to logN which equals logN(1+logN)/2 which is O((logN)^2)
The total size of inputs to be sorted is equal to N as each node belongs to one and only one column. This means the average sort size is O(N / (logN)^2). Let the value of M be equal to N / (logN)^2.
This means the overall time complexity of
O(number of total sorts * average sort complexity) =
O((logN)^2 * M log M) =
O((logN)^2 * (N / (logN)^2) log [N / (logN)^2]) =
O(N log [N / (logN)^2])
This bound is tighter than O(N log N), but O(N log N) is still technically correct.
Upvotes: 1