Reputation: 933
Our users sometimes gives us misspelled names/usernames and I would like to be able to search active directory for a near match, sorting by closest (any algorithm would be fine). For example, if I try
Get-Aduser -Filter {GivenName -like "Jack"}
I can find the user Jack, but not if I use "Jacck" or "ack"
Is there a simple way to do this?
Upvotes: 2
Views: 5682
Reputation: 27428
This somewhat works with ambiguous name resolution of various properties, but not the "Jacck" misspelling. I get all five results.
get-aduser -filter 'anr -eq "ack"' -ResultSetSize 5
Upvotes: 0
Reputation: 933
OK, based on the great answers that I got (thanks @boxdog and @Palle Due) I am posting a more complete one.
Major source: https://github.com/gravejester/Communary.PASM - PowerShell Approximate String Matching. Great Module for this topic.
source: https://github.com/gravejester/Communary.PASM/tree/master/Functions
# download functions to the temp folder
$urls =
"https://raw.githubusercontent.com/gravejester/Communary.PASM/master/Functions/Get-CommonPrefix.ps1" ,
"https://raw.githubusercontent.com/gravejester/Communary.PASM/master/Functions/Get-LevenshteinDistance.ps1" ,
"https://raw.githubusercontent.com/gravejester/Communary.PASM/master/Functions/Get-LongestCommonSubstring.ps1" ,
"https://raw.githubusercontent.com/gravejester/Communary.PASM/master/Functions/Get-FuzzyMatchScore.ps1"
$paths = $urls | %{$_.split("\/")|select -last 1| %{"$env:TEMP\$_"}}
[Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12
for($i=0;$i -lt $urls.count;$i++){
Invoke-WebRequest -Uri $urls[$i] -OutFile $paths[$i]
}
# concatenating the functions so we don't have to deal with source permissions
foreach($path in $paths){
cat $path | Add-Content "$env:TEMP\Fuzzy_score_functions.ps1"
}
# to save for later, open the temp folder with: Invoke-Item $env:TEMP
# then copy "Fuzzy_score_functions.ps1" somewhere else
# source Fuzzy_score_functions.ps1
. "$env:TEMP\Fuzzy_score_functions.ps1"
Simple test:
Get-FuzzyMatchScore "a" "abc" # 98
Create a score function:
## start function
function get_score{
param($searchQuery,$searchData,$nlist,[switch]$levd)
if($nlist -eq $null){$nlist = 10}
$scores = foreach($string in $searchData){
Try{
if($levd){
$score = Get-LevenshteinDistance $searchQuery $string }
else{
$score = Get-FuzzyMatchScore -Search $searchQuery -String $string }
Write-Output (,([PSCustomObject][Ordered] @{
Score = $score
Result = $string
}))
$I = $searchData.indexof($string)/$searchData.count*100
$I = [math]::Round($I)
Write-Progress -Activity "Search in Progress" -Status "$I% Complete:" -PercentComplete $I
}Catch{Continue}
}
if($levd) { $scores | Sort-Object Score,Result |select -First $nlist }
else {$scores | Sort-Object Score,Result -Descending |select -First $nlist }
} ## end function
Examples
get_score "Karolin" @("Kathrin","Jane","John","Cameron")
# check the difference between Fuzzy and LevenshteinDistance mode
$names = "Ferris","Cameron","Sloane","Jeanie","Edward","Tom","Katie","Grace"
"Fuzzy"; get_score "Cam" $names
"Levenshtein"; get_score "Cam" $names -levd
Test the performance on a big dataset
## donload baby-names
$url = "https://github.com/hadley/data-baby-names/raw/master/baby-names.csv"
$output = "$env:TEMP\baby-names.csv"
[Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12
Invoke-WebRequest -Uri $url -OutFile $output
$babynames = import-csv "$env:TEMP\baby-names.csv"
$babynames.count # 258000 lines
$babynames[0..3] # year, name, percent, sex
$searchdata = $babynames.name[0..499]
$query = "Waren" # missing letter
"Fuzzy"; get_score $query $searchdata
"Levenshtein"; get_score $query $searchdata -levd
$query = "Jon" # missing letter
"Fuzzy"; get_score $query $searchdata
"Levenshtein"; get_score $query $searchdata -levd
$query = "Howie" # lookalike
"Fuzzy"; get_score $query $searchdata;
"Levenshtein"; get_score $query $searchdata -levd
Test
$query = "John"
$res = for($i=1;$i -le 10;$i++){
$searchdata = $babynames.name[0..($i*100-1)]
$meas = measure-command{$res = get_score $query $searchdata}
write-host $i
Write-Output (,([PSCustomObject][Ordered] @{
N = $i*100
MS = $meas.Milliseconds
MS_per_line = [math]::Round($meas.Milliseconds/$searchdata.Count,2)
}))
}
$res
+------+-----+-------------+
| N | MS | MS_per_line |
| - | -- | ----------- |
| 100 | 696 | 6.96 |
| 200 | 544 | 2.72 |
| 300 | 336 | 1.12 |
| 400 | 6 | 0.02 |
| 500 | 718 | 1.44 |
| 600 | 452 | 0.75 |
| 700 | 224 | 0.32 |
| 800 | 912 | 1.14 |
| 900 | 718 | 0.8 |
| 1000 | 417 | 0.42 |
+------+-----+-------------+
These times are quite crazy, if anyone understand why please comment on it.
The best way to do this depends on the organization of the AD. Here we have many OUs, but common users will be in Users and DisabledUsers. Also Domain and DC will be different (I'm changing ours here to <domain>
and <DC>
).
# One way to get a List of OUs
Get-ADOrganizationalUnit -Filter * -Properties CanonicalName |
Select-Object -Property CanonicalName
then you can use Where-Object -FilterScript {}
to filter per OU
# example, saving on the temp folder
Get-ADUser -f * |
Where-Object -FilterScript {
($_.DistinguishedName -match "CN=\w*,OU=DisabledUsers,DC=<domain>,DC=<DC>" -or
$_.DistinguishedName -match "CN=\w*,OU=Users,DC=<domain>,DC=<DC>") -and
$_.GivenName -ne $null #remove users without givenname, like test users
} |
select @{n="Fullname";e={$_.GivenName+" "+$_.Surname}},
GivenName,Surname,SamAccountName |
Export-CSV -Path "$env:TEMP\all_Users.csv" -NoTypeInformation
# you can open the file to inspect
Invoke-Item "$env:TEMP\all_Users.csv"
# import
$allusers = Import-Csv "$env:TEMP\all_Users.csv"
$allusers.Count # number of lines
Usage:
get_score "Jane Done" $allusers.fullname 15 # return the 15 first
get_score "jdoe" $allusers.samaccountname 15
Upvotes: 0
Reputation: 364
Interesting question and answers. But a possible simpler solution is to search by more than one attribute as I would hope most people would spell one of their names properly :)
Get-ADUser -Filter {GivenName -like "FirstName" -or SurName -Like "SecondName"}
Upvotes: 2
Reputation: 8432
The Soundex algorithm is designed for just this situation. Here is some PowerShell code that might help:
Upvotes: 1
Reputation: 6292
You can calculate the Levenshtein distance between the two strings and make sure it's under a certain threshold (probably 1 or 2). There is a powershell example here: Levenshtein distance in powershell
Examples:
Upvotes: 3