RBA
RBA

Reputation: 933

Partial/near match for name and/or username in Active Directory / Powershell

Our users sometimes gives us misspelled names/usernames and I would like to be able to search active directory for a near match, sorting by closest (any algorithm would be fine). For example, if I try

Get-Aduser -Filter {GivenName -like "Jack"}

I can find the user Jack, but not if I use "Jacck" or "ack"

Is there a simple way to do this?

Upvotes: 2

Views: 5682

Answers (5)

js2010
js2010

Reputation: 27428

This somewhat works with ambiguous name resolution of various properties, but not the "Jacck" misspelling. I get all five results.

get-aduser -filter 'anr -eq "ack"' -ResultSetSize 5

Upvotes: 0

RBA
RBA

Reputation: 933

OK, based on the great answers that I got (thanks @boxdog and @Palle Due) I am posting a more complete one.

Major source: https://github.com/gravejester/Communary.PASM - PowerShell Approximate String Matching. Great Module for this topic.

1) FuzzyMatchScore function

source: https://github.com/gravejester/Communary.PASM/tree/master/Functions

# download functions to the temp folder
$urls = 
"https://raw.githubusercontent.com/gravejester/Communary.PASM/master/Functions/Get-CommonPrefix.ps1"    ,
"https://raw.githubusercontent.com/gravejester/Communary.PASM/master/Functions/Get-LevenshteinDistance.ps1" ,
"https://raw.githubusercontent.com/gravejester/Communary.PASM/master/Functions/Get-LongestCommonSubstring.ps1"  ,
"https://raw.githubusercontent.com/gravejester/Communary.PASM/master/Functions/Get-FuzzyMatchScore.ps1" 

$paths = $urls | %{$_.split("\/")|select -last 1| %{"$env:TEMP\$_"}}

[Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12
for($i=0;$i -lt $urls.count;$i++){
Invoke-WebRequest -Uri $urls[$i] -OutFile $paths[$i]
}

# concatenating the functions so we don't have to deal with source permissions
foreach($path in $paths){
cat $path | Add-Content "$env:TEMP\Fuzzy_score_functions.ps1"
}

# to save for later, open the temp folder with: Invoke-Item $env:TEMP 
# then copy "Fuzzy_score_functions.ps1" somewhere else

# source Fuzzy_score_functions.ps1
. "$env:TEMP\Fuzzy_score_functions.ps1"

Simple test:

Get-FuzzyMatchScore "a" "abc" # 98

Create a score function:

## start function
function get_score{
param($searchQuery,$searchData,$nlist,[switch]$levd)

if($nlist -eq $null){$nlist = 10}

$scores = foreach($string in $searchData){
    Try{
    if($levd){    
        $score = Get-LevenshteinDistance $searchQuery $string }
    else{
        $score = Get-FuzzyMatchScore -Search $searchQuery -String $string }
    Write-Output (,([PSCustomObject][Ordered] @{
                        Score = $score
                        Result = $string
                    }))
    $I = $searchData.indexof($string)/$searchData.count*100
    $I = [math]::Round($I)
    Write-Progress -Activity "Search in Progress" -Status "$I% Complete:" -PercentComplete $I
    }Catch{Continue}
}

if($levd) { $scores | Sort-Object Score,Result |select -First $nlist }
else {$scores | Sort-Object Score,Result -Descending |select -First $nlist }
} ## end function

Examples

get_score "Karolin" @("Kathrin","Jane","John","Cameron")

# check the difference between Fuzzy and LevenshteinDistance mode
$names = "Ferris","Cameron","Sloane","Jeanie","Edward","Tom","Katie","Grace"
"Fuzzy"; get_score "Cam" $names
"Levenshtein"; get_score "Cam" $names -levd

Test the performance on a big dataset

## donload baby-names

$url = "https://github.com/hadley/data-baby-names/raw/master/baby-names.csv"
$output = "$env:TEMP\baby-names.csv"
[Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12
Invoke-WebRequest -Uri $url -OutFile $output
$babynames = import-csv "$env:TEMP\baby-names.csv"
$babynames.count # 258000 lines

$babynames[0..3] # year, name, percent, sex

$searchdata = $babynames.name[0..499]

$query = "Waren" # missing letter
"Fuzzy"; get_score $query $searchdata
"Levenshtein"; get_score $query $searchdata -levd

$query = "Jon" # missing letter
"Fuzzy"; get_score $query $searchdata
"Levenshtein"; get_score $query $searchdata -levd

$query = "Howie" # lookalike
"Fuzzy"; get_score $query $searchdata;
"Levenshtein"; get_score $query $searchdata -levd

Test

$query = "John"

$res = for($i=1;$i -le 10;$i++){
    $searchdata = $babynames.name[0..($i*100-1)]
    $meas = measure-command{$res = get_score $query $searchdata}
    write-host $i
    Write-Output (,([PSCustomObject][Ordered] @{
        N = $i*100
        MS = $meas.Milliseconds
        MS_per_line = [math]::Round($meas.Milliseconds/$searchdata.Count,2)
                    }))
}
$res

+------+-----+-------------+
| N    | MS  | MS_per_line |
| -    | --  | ----------- |
| 100  | 696 | 6.96        |
| 200  | 544 | 2.72        |
| 300  | 336 | 1.12        |
| 400  | 6   | 0.02        |
| 500  | 718 | 1.44        |
| 600  | 452 | 0.75        |
| 700  | 224 | 0.32        |
| 800  | 912 | 1.14        |
| 900  | 718 | 0.8         |
| 1000 | 417 | 0.42        |
+------+-----+-------------+

These times are quite crazy, if anyone understand why please comment on it.

2) Generate a table of Names from Active Directory

The best way to do this depends on the organization of the AD. Here we have many OUs, but common users will be in Users and DisabledUsers. Also Domain and DC will be different (I'm changing ours here to <domain> and <DC>).

# One way to get a List of OUs
Get-ADOrganizationalUnit -Filter * -Properties CanonicalName | 
  Select-Object -Property CanonicalName

then you can use Where-Object -FilterScript {} to filter per OU

# example, saving on the temp folder
Get-ADUser -f * |
 Where-Object -FilterScript {
    ($_.DistinguishedName -match "CN=\w*,OU=DisabledUsers,DC=<domain>,DC=<DC>" -or
    $_.DistinguishedName -match "CN=\w*,OU=Users,DC=<domain>,DC=<DC>") -and
    $_.GivenName -ne $null #remove users without givenname, like test users
    } | 
    select @{n="Fullname";e={$_.GivenName+" "+$_.Surname}},
    GivenName,Surname,SamAccountName |
    Export-CSV -Path "$env:TEMP\all_Users.csv" -NoTypeInformation
# you can open the file to inspect 
Invoke-Item "$env:TEMP\all_Users.csv"
# import
$allusers = Import-Csv "$env:TEMP\all_Users.csv"
$allusers.Count # number of lines

Usage:

get_score "Jane Done" $allusers.fullname 15 # return the 15 first
get_score "jdoe" $allusers.samaccountname 15

Upvotes: 0

Dave
Dave

Reputation: 364

Interesting question and answers. But a possible simpler solution is to search by more than one attribute as I would hope most people would spell one of their names properly :)

Get-ADUser -Filter {GivenName -like "FirstName" -or SurName -Like "SecondName"}

Upvotes: 2

boxdog
boxdog

Reputation: 8432

The Soundex algorithm is designed for just this situation. Here is some PowerShell code that might help:

Get-Soundex.ps1

Upvotes: 1

Palle Due
Palle Due

Reputation: 6292

You can calculate the Levenshtein distance between the two strings and make sure it's under a certain threshold (probably 1 or 2). There is a powershell example here: Levenshtein distance in powershell

Examples:

  • Jack and Jacck have an LD of 1.
  • Jack and ack have an LD of 1.
  • Palle and Havnefoged have an LD of 8.

Upvotes: 3

Related Questions