choooper2006
choooper2006

Reputation: 63

Powershell Get-Random with Constraints

I'm currently using the Get-Random function of Powershell to randomly pull a set number of rows from a csv. I need to create a constraint that says if one id is pulled, find the other ids that match it and pull their value.

Here is what I currently have:

$chosenOnes = Import-CSV C:\Temp\pk2.csv | sort{Get-Random} | Select -first 6

$i = 1

$count = $chosenOnes | Group-Object householdID


foreach ($row in $count)
{

    if ($row.count -gt 1)
    {

        $students = $row.Group.Student

        foreach ($student in $students)
        {

            $name = $student.tostring()

            #...do something

            $i = $i + 1
         }
    }
    else
    {

        $name = $row.Group.Student

        if($i -le 5)
        {
            #...do something
        }
        else
        {
            #...do something
        }
        $i = $i + 1
    }
}

Example dataset

ID,name
165,Ernest Hemingway
1204,Mark Twain
1578,Stephen King
1634,Charles Dickens
1726,George Orwell
7751,John Doe
7751,Tim Doe

In this example, there are 7 rows but I'm randomly selecting 6 in my code. What needs to happen is when ID=7751 then I must return both rows where ID=7751. The IDs cannot not be statically set in the code.

Upvotes: 1

Views: 1542

Answers (2)

mklement0
mklement0

Reputation: 438123

Use Get-Random directly, with -Count, to extract a given number of random elements from a collection.

$allRows = Import-CSV C:\Temp\pk2.csv

$chosenHouseholdIDs = ($allRows | Get-Random -Count 6).householdID

Then filter all rows by whether their householdID column contains one of the 6 randomly selected rows' householdID values (PSv3+ syntax), using the -in array-containment operator:

$allRows | Where-Object householdID -in $chosenHouseholdIDs

Optional reading: performance considerations:

$allRows | Get-Random -Count 6 is not only conceptually simpler, but also much faster than $allRows | Sort-Object { Get-Random } | Select-Object -First 6

Using the Time-Command function to compare the performance of two approaches, using a 1000-row test file with 10 columns yields the following sample timings on my Windows 10 VM in Windows PowerShell - note that the Sort-Object { Get-Random }-based solution is more than 15(!) times slower:

Factor Secs (100-run avg.) Command                                                        TimeSpan
------ ------------------- -------                                                        --------
1.00   0.007               $allRows | Get-Random -Count 6                                 00:00:00.0072520
15.65  0.113               $allRows | Sort-Object { Get-Random } | Select-Object -First 6 00:00:00.1134909

Similarly, a single pass through all rows to find matching IDs via array-containment operator -in performs much better than looping over the randomly selected IDs and searching all rows for each.

Upvotes: 2

ArcSet
ArcSet

Reputation: 6860

I tried sticking with your beginning and came up with this.

$Array = Import-CSV C:\test\StudtentTest.csv
$Array | Sort{Get-Random} | select -first 2 | %{
    $id = $_.id
    $Array | ?{$_.id -eq $id} | %{
        $_
    }
}

$Array will be your parsed CSV

We pipe in and sort by random select -first 2 (in this case) Save the ID of the object into $id and then search the array for that ID and dispaly each that matches

If same ID does match you end up with something like

ID   name           
--   ----           
7751 John Doe       
7751 Tim Doe        
1634 Charles Dickens

Upvotes: 1

Related Questions