user5865015
user5865015

Reputation: 41

Analyzing a CSV in Powershell

I am very new to Powershell(about 1 day to be more precise) and I am having what I assume are some syntax issues with some variables. I have a CSV file spreadsheet(converted from an Excel xlsx) with around 21 columns and 74,000 rows. The four columns of interest to me are columns having to do with an employees start date, a termination date, a department name, and a vice president they report to. I am trying to write a script that will return all employees whom have reached their start date, have not been terminated, work in a department that contains 'HR' in the name, and report to a specific VP. I will elaborate on my specific issues after the block of code.

    $Lawson = Import-Csv .\Documents\Lawson_HR.csv 

PS C:\Users\louiez> $startDate = $Lawson | where {$_.'LAW HIRE DATE' -le (Get-Date -format M-DD-YYYY)} 

PS C:\Users\louiez> $endDate = $startDate | where {$_.'LAW TERM DATE' -eq ''} 

PS C:\Users\louiez> $HR = $endDate | where {$_.'LAW DEPT NAME' -contains 'HR'}

PS C:\Users\louiez> $VP = $endDate | where {$_.'VICE PRESIDENT' -contains 'Croner'}


PS C:\Users\louiez> $startdate | Measure-Object
Count    : 51641
Average  : 
Sum      : 
Maximum  : 
Minimum  : 
Property :  

PS C:\Users\louiez> $enddate | Measure-Object  
Count    : 19428
Average  : 
Sum      : 
Maximum  : 
Minimum  : 
Property :  

PS C:\Users\louiez> $HR | Measure-Object
Count    : 0
Average  : 
Sum      : 
Maximum  : 
Minimum  : 
Property :  

First, the startDate variable does not count the correct amount of items. I would like for it to count all rows in which the employee hire date is before today's date. the code in its current for returns about 51k items, it should be around 73k. (The endDate variable functions as it should.)

Second the HR variable returns 0 items, it should be several hundred. I would like for it to search the Dept Name field in each row for any instance on the letters 'HR'. Similarly I would like for the VP variable to return all items in which the Vice President column has a given name(in this case, Croner).

As I said, I am incredibly new to Powershell and have some very limited programming experience so I am not sure what in the syntax is causing these errors.

Upvotes: 1

Views: 164

Answers (1)

TessellatingHeckler
TessellatingHeckler

Reputation: 28983

There are a couple of flaws in your design, the easy one:

$_.'LAW DEPT NAME' -contains 'HR'
$_.'VICE PRESIDENT' -contains 'Croner'

-contains is un-intuitive, it does not match text content, it matches items in a container of multiple items. Help about_Comparison_Operators for details. Use -match instead.

$_.'LAW DEPT NAME' -match 'HR'
$_.'VICE PRESIDENT' -match 'Croner'

The second is more complex:

$_.'LAW HIRE DATE' -le (Get-Date -format M-DD-YYYY)

$_.'LAW HIRE DATE' is probably going to return you a string of text, get-date with the -format parameter will return you a string of text, and -le will do alphabetical order sorting (with adjustments), which will be completely unreliable, saying the 1st Feb comes before 2nd Jan because it starts with 1.

Alphabetical order sorting is more workable on a date format like yyyy-MM-dd, but as wOxxOm comments, the right way is to process the date in the CSV into a [datetime] object and then compare that with the current date as a [datetime] object. This will work more reliably for comparisons (give or take timezone and daylight savings time considerations).

[datetime]::ParseExact($_.'LAW HIRE DATE', 'dd-MM-yyyy', $null) -le (Get-Date)

Assuming that the LAW HIRE DATE is always and exactly in the format dd-MM-yyyy, otherwise this will fall over and you'll have to adjust to fit your data - or adjust your spreadsheet to fit your code.

Upvotes: 4

Related Questions