Reputation: 2051
I have a CSV file that is a mess. I'm trying to use regex to extract the first name and last name from a value in a column in the csv file. The first and last names will have their own columns.
The CSV file (with different combinations of delimiters):
ID,Description,Number JDo,John Doe - Temp - Client Client Ops,SomeValue JDo,John Doe - Temp - Client Client Ops,SomeValue JDo,John Doe - Temp - Client Client Ops,SomeValue JDo,John Doe - Temp - Client Client Ops,SomeValue JDo,John Doe - Temp - Client Client Ops,SomeValue JDo,John Doe - Temp - Client Client Ops,SomeValue JDo,John Doe - Temp - Client Client Ops,SomeValue JDo,John Doe - Temp - Client Client Ops,SomeValue JDo,John Doe - Temp - Client Client Ops ,SomeValue JDo,John Doe - Temp - Client Client Ops ,SomeValue JDo,John Doe-Temp-Client Client Ops,SomeValue JDo,John Doe - Temp-Client Client Ops,SomeValue JDo,John Doe - Temp-Client Client Ops,SomeValue JDo,John Doe-Temp - Client Client Ops,SomeValue JDo,John Doe - Temp - Client Client Ops,SomeValue JDo,John Doe - Temp - Client Client Ops,SomeValue JDo,John Doe - Temp - Client Client Ops,SomeValue JDo,John Doe - Temp - Client Client Ops,SomeValue JDo,John Doe-Temp - Client Client Ops ,SomeValue JDo,John Doe-Temp-Client Client Ops ,SomeValue JDo,John.Doe - Temp - Client Client Ops,SomeValue JDo,John .Doe - Temp - Client Client Ops,SomeValue JDo,John. Doe - Temp - Client Client Ops,SomeValue JDo,John . Doe - Temp - Client Client Ops,SomeValue JDo,John.Doe - Temp - Client Client Ops ,SomeValue JDo,John .Doe - Temp - Client Client Ops ,SomeValue JDo,John. Doe - Temp - Client Client Ops ,SomeValue JDo,John . Doe - Temp - Client Client Ops ,SomeValue JDo,John.Doe-Temp-Client Client Ops,SomeValue JDo,John .Doe-Temp-Client Client Ops,SomeValue JDo,John. Doe-Temp-Client Client Ops,SomeValue JDo,John . Doe-Temp-Client Client Ops,SomeValue JDo,John.Doe - Temp - Client Client Ops,SomeValue JDo,John .Doe - Temp - Client Client Ops,SomeValue JDo,John. Doe - Temp - Client Client Ops,SomeValue JDo,John . Doe - Temp - Client Client Ops,SomeValue JDo,John?Doe - Temp - Client Client Ops,SomeValue JDo,John ?Doe - Temp - Client Client Ops,SomeValue JDo,John? Doe - Temp - Client Client Ops,SomeValue JDo,John ? Doe - Temp - Client Client Ops,SomeValue JDo,John?Doe - Temp - Client Client Ops ,SomeValue JDo,John ?Doe - Temp - Client Client Ops ,SomeValue JDo,John? Doe - Temp - Client Client Ops ,SomeValue JDo,John ? Doe - Temp - Client Client Ops ,SomeValue JDo,John?Doe-Temp-Client Client Ops,SomeValue JDo,John ?Doe-Temp-Client Client Ops,SomeValue JDo,John? Doe-Temp-Client Client Ops,SomeValue JDo,John ? Doe-Temp-Client Client Ops,SomeValue JDo,John?Doe - Temp - Client Client Ops,SomeValue JDo,John ?Doe - Temp - Client Client Ops,SomeValue JDo,John? Doe - Temp - Client Client Ops,SomeValue JDo,John ? Doe - Temp - Client Client Ops,SomeValue JDo,"John,Doe - Temp - Client Client Ops",SomeValue JDo,"John ,Doe - Temp - Client Client Ops",SomeValue JDo,"John, Doe - Temp - Client Client Ops",SomeValue JDo,"John , Doe - Temp - Client Client Ops",SomeValue JDo," John,Doe - Temp - Client Client Ops ",SomeValue JDo," John ,Doe - Temp - Client Client Ops ",SomeValue JDo," John, Doe - Temp - Client Client Ops ",SomeValue JDo," John , Doe - Temp - Client Client Ops ",SomeValue JDo,"John,Doe-Temp-Client Client Ops",SomeValue JDo,"John ,Doe-Temp-Client Client Ops",SomeValue JDo,"John, Doe-Temp-Client Client Ops",SomeValue JDo,"John , Doe-Temp-Client Client Ops",SomeValue JDo,"John,Doe - Temp - Client Client Ops",SomeValue JDo,"John ,Doe - Temp - Client Client Ops",SomeValue JDo,"John, Doe - Temp - Client Client Ops",SomeValue JDo,"John , Doe - Temp - Client Client Ops",SomeValue JDo,John-Doe - Temp - Client Client Ops,SomeValue JDo,John -Doe - Temp - Client Client Ops,SomeValue JDo,John- Doe - Temp - Client Client Ops,SomeValue JDo,John - Doe - Temp - Client Client Ops,SomeValue JDo,John-Doe - Temp - Client Client Ops ,SomeValue JDo,John -Doe - Temp - Client Client Ops ,SomeValue JDo,John- Doe - Temp - Client Client Ops ,SomeValue JDo,John - Doe - Temp - Client Client Ops ,SomeValue JDo,John-Doe-Temp-Client Client Ops,SomeValue JDo,John -Doe-Temp-Client Client Ops,SomeValue JDo,John- Doe-Temp-Client Client Ops,SomeValue JDo,John - Doe-Temp-Client Client Ops,SomeValue JDo,John-Doe - Temp - Client Client Ops,SomeValue JDo,John -Doe - Temp - Client Client Ops,SomeValue JDo,John- Doe - Temp - Client Client Ops,SomeValue JDo,John - Doe - Temp - Client Client Ops,SomeValue
To add the first and last name columns, I am using the following code:
Function FixRxClaimReportAddFirstLastNameColumn {
Param ($csvFile)
Write-Host "Adding columns 'First Name' and 'Last Name' to $csvFile"
Import-Csv $csvFile |
Select-Object *, @{n='First Name'; e={if ($_.Description) {
$columnFirstNameValue = $($_.Description -replace '\s+', ' ').split(" ")[0]
if ($columnFirstNameValue -notlike "*,*" -and $columnFirstNameValue -notmatch '\?' -and $columnFirstNameValue -notlike "*.*" -and $columnFirstNameValue -notlike "*-*") {
$columnFirstNameValue.Trim()
} else {
$columnFirstNameValue2 = $($_.Description -replace '\s+', ' ') -split {$_ -eq "-" -or $_ -eq "- " -or $_ -eq " -" -or $_ -eq " - " -or $_ -eq "," -or $_ -eq ", " -or $_ -eq " ," -or $_ -eq " , " -or $_ -eq "." -or $_ -eq ". " -or $_ -eq " ." -or $_ -eq " . " -or $_ -eq "?" -or $_ -eq "? " -or $_ -eq " ?" -or $_ -eq " ? "}
$columnFirstNameValue2[0].Trim()
}
}}}, @{n='Last Name'; e={if ($_.Description) {
$columnLastNameValue = $($_.Description -replace '\s+', ' ').split(" ")[1]
if ($columnLastNameValue -notlike "*,*" -and $columnLastNameValue -notmatch '\?' -and $columnLastNameValue -notlike "*.*" -and $columnLastNameValue -notlike "*-*") {
$columnLastNameValue.Trim()
} else {
$columnLastNameValue2 = $($_.Description -replace '\s+', ' ') -split {$_ -eq "-" -or $_ -eq "- " -or $_ -eq " -" -or $_ -eq " - " -or $_ -eq "," -or $_ -eq ", " -or $_ -eq " ," -or $_ -eq " , " -or $_ -eq "." -or $_ -eq ". " -or $_ -eq " ." -or $_ -eq " . " -or $_ -eq "?" -or $_ -eq "? " -or $_ -eq " ?" -or $_ -eq " ? "}
$columnLastNameValue2[1].Trim()
}
}}} | Export-Csv "$csvFile-Results.csv" -NoTypeInformation -Force
Write-Host "Complete."
Write-Host ""
}
FixRxClaimReportAddFirstLastNameColumn 'C:\Scripts\Tests\Test1.csv'
When I run this code, all first name values should be John, and all last name values should be Doe. However, the values are very different for all.
Upvotes: 0
Views: 81
Reputation: 200523
You're thinking too complicated. Remove the additional information from the end of the Description
field to get just the name, then trim the name and split it into first and last name before adding these as new properties to the input object.
Try this:
Import-Csv 'C:\path\to\input.csv' | ForEach-Object {
$rawname = $_.Description -replace '-[^-]*-[^-]*$'
$firstname, $lastname = $rawname.Trim() -split ' *[ \?\.,-] *'
$_ | Add-Member -Type NoteProperty -Name FirstName -Value $firstname
$_ | Add-Member -Type NoteProperty -Name LastName -Value $lastname
$_
} | Export-Csv 'C:\path\to\output.csv' -NoType
Upvotes: 3