brettdj
brettdj

Reputation: 55682

Return duplicate names (including partial matches)

Excel guy here that occasionally turns to automating via vba.

I tried to solve https://stackoverflow.com/q/36538022/641067 (now closed) and couldn't get there with my basic powershell knowledge and googlefu alone.

In essence the problem the OP presented is:

  1. There are a list of names in a text file.
  2. Aim is to capture only those names that occurr at least once (so discard unique names, see point (3)).
  3. Names occurring at least once include partial matches, ie Will and William can be considered duplicates and should be retained. Whereas Bill is not a duplicate of William.

I tried various approaches including

But I was stymied by part (3). I suspect that a loop is required to do this but am curious whether there is a direct Powershellapproach,

Looking forward to hearing from the experts.

what I tried

$a = Get-Content "c:\temp\in.txt"
$b = $a | select -unique
[regex] $a_regex = ‘(?i)(‘ + (($a |foreach {[regex]::escape($_)}) –join “|”) + ‘)’
 $c = $b -match $a_regex
 Compare-object –referenceobject $c -IncludeEqual $a 

Upvotes: 2

Views: 135

Answers (1)

Lieven Keersmaekers
Lieven Keersmaekers

Reputation: 58451

Following testscript using a loop would work for the rules you outlined and looks foolproof to me

$t = ('first', 'will', 'william', 'williamlong', 'unique', 'lieve', 'lieven')
$s = $t | sort-object

[String[]]$r = @()
$i = 0;
while ($i -lt $s.Count - 1) {
    if ($s[$i+1].StartsWith($s[$i])) {
        $r += $s[$i]
        $r += $s[$i+1]
    }
    $i++
}
$r | Sort-Object -Unique

and following testscript using a regex might get you started.

$content = "nomatch`nevenmatch1`nevenmatch12`nunevenmatch1`nunevenmatch12`nunevenmatch123"

$string = (($content.Split("`n") | Sort-Object -Unique) -join "`n")
$regex = [regex] '(?im)^(\w+)(\n\1\w+)+'
$matchdetails = $regex.Match($string)
while ($matchdetails.Success) {
    $matchdetails.Value
    $matchdetails = $matchdetails.NextMatch()
}

Upvotes: 3

Related Questions