Reputation: 87
Unfortunately, I'm working with some less-than-savoury data structures in Java and I have the task of removing what I'm calling "vertical duplicates" in a multi-value map (Map<Enum, List<String>>
) in which all values (List<String>
) are of the same size
Here's an example of what I mean:
{
// Column : 1 2 3 4 5 6 7 8 9
NUMBER : [ "1", "2", "3", "1", "2", "3", "1", "2", "3" ],
LETTER : [ "A", "B", "C", "A", "E", "F", "G", "B", "I" ],
SYMBOL : [ "!", "!", "!", "!", "!", "!", "!", "!", "!" ],
...
}
A "vertical duplicate" would be any column which has the same values as any previous column. In the above map, the duplicates would be columns [1,4] (both with values 1,A,!
), and [2,8] (both with values 2,B,!
).
The output from the above map, after removing "vertical duplicates":
{
// Previous Column:
// 1 2 3 5 6 7 9
NUMBER : [ "1", "2", "3", "2", "3", "1", "3" ],
LETTER : [ "A", "B", "C", "E", "F", "G", "I" ],
SYMBOL : [ "!", "!", "!", "!", "!", "!", "!" ],
...
}
Is there an easy way to remove the "vertical duplicates"? I'm working with multi-value maps with varying key-set sizes (one map may have 3 different enum keys, another may have 17 different enum keys), as well as varying value-set sizes (one map may contain lists each with a size of 2, another may contain lists each with a size of 20).
Upvotes: 0
Views: 531
Reputation: 10082
I would recommend to use a column-based data structure instead of a row-based for these data. At least you can/should use such a structure for this operation and you can add a simple method to turn it back into a row-wise multimap afterwards. This is how it can look like on a full functional example:
public enum Row {
NUMBER, LETTER, SYMBOL, WHATEVER1, WHATEVER2
}
public static class Col {
Map<Row, String> col = new HashMap<>();
public Col(Entry<Row, String>... entries) {
for (Entry<Row, String> entry: entries) {
col.put(entry.getKey(), entry.getValue());
}
}
// to use within a LinkedHashSet
@Override
public boolean equals(Object other) {
if (this == other) return true;
if (other == null || getClass() != other.getClass()) return false;
return Objects.equals(col, ((Col) other).col);
}
@Override
public int hashCode() { return Objects.hash(col); }
@Override
public String toString() { return col.toString(); }
}
public static void main(String[] argv) {
// alternatively use LinkedHashSet directly
List<Col> cols = new ArrayList<>();
cols.add(new Col(new SimpleEntry<>(Row.NUMBER, "1"), new SimpleEntry<>(Row.LETTER, "A"), new SimpleEntry<>(Row.WHATEVER1, "X")));
cols.add(new Col(new SimpleEntry<>(Row.NUMBER, "2"), new SimpleEntry<>(Row.LETTER, "B"), new SimpleEntry<>(Row.SYMBOL, "!")));
cols.add(new Col(new SimpleEntry<>(Row.NUMBER, "1"), new SimpleEntry<>(Row.LETTER, "A"), new SimpleEntry<>(Row.WHATEVER1, "X")));
// turn original structure unique keeping order of insertion
Set<Col> unique = new LinkedHashSet<>(cols);
System.out.println(unique);
}
Prints
[{LETTER=A, NUMBER=1, WHATEVER1=X}, {LETTER=B, NUMBER=2, SYMBOL=!}]
Upvotes: 2
Reputation: 44960
There are different approaches depending on time or space constraints but you can build a histogram for each column value (e.g. Map<String, Integer>
) and remove all columns that has a count of 2 or more. This should be more efficient than comparing every column with every other column.
Upvotes: 0