Reputation: 31071
I have the following structures defined (names are anonymised, but data types are correct):
Public Type ExampleDataItem
Limit As Integer ' could be any value 0-999
Status As Integer ' could be any value 0-2
ValidUntil As Date ' always a valid date
End Type
Public Type ExampleData
Name As String ' could be 5-20 chars long
ValidOn As Date ' could be valid date or 1899-12-30 representing "null"
Salt As Integer ' random value 42-32767
Items(0 To 13) As ExampleDataItem
End Type
I would like to generate a 32-bit hash code for an ExampleData
instance. Minimising hash collisions is important, performance and data order is not important.
So far I have got (in pseudocode):
Long
value.Long
values together.I can't really post my code because it's heavily dependent on utility classes to do the serialisation, but if anyone wants to see it regardless then I will post it.
Will this be OK, or can anyone suggest a better way of doing it?
EDIT:
This code is being used to implement part of a software licensing system. The purpose of the hash is to confirm whether the data entered by the end user equals the data entered by the tech support person. The hash must therefore:
The hash is not required for lookup, uniqueness testing, or to store ExampleData
instances in any kind of collection, but only for the one purpose described above.
Upvotes: 0
Views: 632
Reputation: 14041
You may be overthinking it, or I'm not understanding the issue. You could essentially just
hash(CStr(Salt) + Name + CStr(ValidOn) + Anyotherstrings
There is no particular need to go through the process of serializing into byte array and XORing values. Infact XORing values together in that way is more likely to create hash collisions where you aren't intending them.
Edit: I think I understand now. You're creating your own hash value by XORing the data together? It's unfortunately quite likely to give collisions. I know VB6 doesn't include any hashing algorithms, so you may be best importing and using something like Phil Fresle's SHA256 implementation.
Upvotes: 0
Reputation: 12051
Can you use the CRC32? Steve McMahon has an implementation. Combine that with a bit of base32 encoding and you've got something short enough to read over the phone.
Upvotes: 3
Reputation: 30408
EDIT: the question has now been edited to clarify that the goal is detecting typing errors, not minimizing collisions between totally different values. In that case Dan F's answer is the best one IMHO, not my offering below (wonderful though it is).
You could use the Microsoft CryptoAPI rather than rolling your own hash algorithm.
EDIT: Following comment. If you insist on a 32-bit value, it will be hard to minimize hash collisions. My algorithm book suggests using Horner's method as a decent general purpose hashing algorithm. I don't have time right now to find out more information and implement in VB6. CopyMemory would probably be useful :)
Upvotes: 0
Reputation: 7196
Considering that performance is not an objective, if file size is not important and you want a unique value for each item. Just add an ID field. It data type is a string. Then use this function to generate a GUID. This will be a unique ID. Use it as a key for a dictonary or collection.
Public Type GUID
Data1 As Long
Data2 As Integer
Data3 As Integer
Data4(7) As Byte
End Type
Public Type GUID2 '15 BYTES TOTAL
Data1(14) As Byte
End Type
Public Declare Function CoCreateGuid Lib "OLE32.DLL" (pGuid As GUID) As Long
Public Function GetGUID() As String
Dim VBRIG_PROC_ID_STRING As String
VBRIG_PROC_ID_STRING = "GetGUID()"
Dim lResult As Long
Dim lguid As GUID
Dim MyguidString As String
Dim MyGuidString1 As String
Dim MyGuidString2 As String
Dim MyGuidString3 As String
Dim DataLen As Integer
Dim StringLen As Integer
Dim i As Integer
On Error GoTo error_olemsg
lResult = CoCreateGuid(lguid)
If lResult = 0 Then
MyGuidString1 = Hex$(lguid.Data1)
StringLen = Len(MyGuidString1)
DataLen = Len(lguid.Data1)
MyGuidString1 = LeadingZeros(2 * DataLen, StringLen) & MyGuidString1
'First 4 bytes (8 hex digits)
MyGuidString2 = Hex$(lguid.Data2)
StringLen = Len(MyGuidString2)
DataLen = Len(lguid.Data2)
MyGuidString2 = LeadingZeros(2 * DataLen, StringLen) & Trim$(MyGuidString2)
'Next 2 bytes (4 hex digits)
MyGuidString3 = Hex$(lguid.Data3)
StringLen = Len(MyGuidString3)
DataLen = Len(lguid.Data3)
MyGuidString3 = LeadingZeros(2 * DataLen, StringLen) & Trim$(MyGuidString3)
'Next 2 bytes (4 hex digits)
GetGUID = MyGuidString1 & MyGuidString2 & MyGuidString3
For i = 0 To 7
MyguidString = MyguidString & Format$(Hex$(lguid.Data4(i)), "00")
Next i
'MyGuidString contains last 8 bytes of Guid (16 hex digits)
GetGUID = GetGUID & MyguidString
Else
GetGUID = "00000000" ' return zeros if function unsuccessful
End If
Exit Function
error_olemsg:
GetGUID = "00000000"
Exit Function
End Function
Public Function LeadingZeros(ExpectedLen As Integer, ActualLen As Integer) As String
LeadingZeros = String$(ExpectedLen - ActualLen, "0")
End Function
Upvotes: 0