Fiddle Freak
Fiddle Freak

Reputation: 2051

Split column value in CSV file using multiple and duplicate delimiters

I have a CSV file that is a mess. I'm trying to use regex to extract the first name and last name from a value in a column in the csv file. The first and last names will have their own columns.

The CSV file (with different combinations of delimiters):

ID,Description,Number
JDo,John Doe - Temp - Client Client Ops,SomeValue
JDo,John  Doe - Temp - Client Client Ops,SomeValue
JDo,John  Doe  - Temp - Client Client Ops,SomeValue
JDo,John  Doe  -  Temp - Client Client Ops,SomeValue
JDo,John  Doe  -  Temp  - Client Client Ops,SomeValue
JDo,John  Doe  -  Temp  -  Client Client Ops,SomeValue
JDo,John  Doe  -  Temp  -  Client  Client Ops,SomeValue
JDo,John  Doe  -  Temp  -  Client  Client  Ops,SomeValue
JDo,John  Doe  -  Temp  -  Client  Client  Ops ,SomeValue
JDo,John  Doe  -  Temp  -  Client  Client  Ops  ,SomeValue
JDo,John Doe-Temp-Client Client Ops,SomeValue
JDo,John  Doe - Temp-Client Client Ops,SomeValue
JDo,John  Doe  - Temp-Client Client Ops,SomeValue
JDo,John  Doe-Temp -  Client Client Ops,SomeValue
JDo,John  Doe  -  Temp  - Client Client Ops,SomeValue
JDo,John  Doe  -  Temp  -  Client Client Ops,SomeValue
JDo,John  Doe  -  Temp  -  Client  Client Ops,SomeValue
JDo,John  Doe  -  Temp  -  Client  Client  Ops,SomeValue
JDo,John  Doe-Temp  -  Client  Client  Ops ,SomeValue
JDo,John  Doe-Temp-Client  Client  Ops  ,SomeValue
JDo,John.Doe - Temp - Client Client Ops,SomeValue
JDo,John .Doe - Temp - Client Client Ops,SomeValue
JDo,John. Doe - Temp - Client Client Ops,SomeValue
JDo,John . Doe - Temp - Client Client Ops,SomeValue
JDo,John.Doe - Temp - Client Client Ops  ,SomeValue
JDo,John .Doe - Temp - Client Client Ops  ,SomeValue
JDo,John. Doe - Temp - Client Client Ops  ,SomeValue
JDo,John . Doe - Temp - Client Client Ops  ,SomeValue
JDo,John.Doe-Temp-Client Client Ops,SomeValue
JDo,John .Doe-Temp-Client Client Ops,SomeValue
JDo,John. Doe-Temp-Client Client Ops,SomeValue
JDo,John . Doe-Temp-Client Client Ops,SomeValue
JDo,John.Doe  - Temp  - Client Client Ops,SomeValue
JDo,John .Doe -  Temp -  Client Client Ops,SomeValue
JDo,John. Doe  -  Temp  -  Client Client Ops,SomeValue
JDo,John . Doe - Temp - Client Client Ops,SomeValue
JDo,John?Doe - Temp - Client Client Ops,SomeValue
JDo,John ?Doe - Temp - Client Client Ops,SomeValue
JDo,John? Doe - Temp - Client Client Ops,SomeValue
JDo,John ? Doe - Temp - Client Client Ops,SomeValue
JDo,John?Doe - Temp - Client Client Ops  ,SomeValue
JDo,John ?Doe - Temp - Client Client Ops  ,SomeValue
JDo,John? Doe - Temp - Client Client Ops  ,SomeValue
JDo,John ? Doe - Temp - Client Client Ops  ,SomeValue
JDo,John?Doe-Temp-Client Client Ops,SomeValue
JDo,John ?Doe-Temp-Client Client Ops,SomeValue
JDo,John? Doe-Temp-Client Client Ops,SomeValue
JDo,John ? Doe-Temp-Client Client Ops,SomeValue
JDo,John?Doe  - Temp  - Client Client Ops,SomeValue
JDo,John ?Doe -  Temp -  Client Client Ops,SomeValue
JDo,John? Doe  -  Temp  -  Client Client Ops,SomeValue
JDo,John ? Doe - Temp - Client Client Ops,SomeValue
JDo,"John,Doe - Temp - Client Client Ops",SomeValue
JDo,"John ,Doe - Temp - Client Client Ops",SomeValue
JDo,"John, Doe - Temp - Client Client Ops",SomeValue
JDo,"John , Doe - Temp - Client Client Ops",SomeValue
JDo,"  John,Doe - Temp - Client Client Ops  ",SomeValue
JDo,"  John ,Doe - Temp - Client Client Ops  ",SomeValue
JDo,"  John, Doe - Temp - Client Client Ops  ",SomeValue
JDo,"  John , Doe - Temp - Client Client Ops  ",SomeValue
JDo,"John,Doe-Temp-Client Client Ops",SomeValue
JDo,"John ,Doe-Temp-Client Client Ops",SomeValue
JDo,"John, Doe-Temp-Client Client Ops",SomeValue
JDo,"John , Doe-Temp-Client Client Ops",SomeValue
JDo,"John,Doe  - Temp  - Client Client Ops",SomeValue
JDo,"John ,Doe -  Temp -  Client Client Ops",SomeValue
JDo,"John, Doe  -  Temp  -  Client Client Ops",SomeValue
JDo,"John , Doe - Temp - Client Client Ops",SomeValue
JDo,John-Doe - Temp - Client Client Ops,SomeValue
JDo,John -Doe - Temp - Client Client Ops,SomeValue
JDo,John- Doe - Temp - Client Client Ops,SomeValue
JDo,John - Doe - Temp - Client Client Ops,SomeValue
JDo,John-Doe - Temp - Client Client Ops  ,SomeValue
JDo,John -Doe - Temp - Client Client Ops  ,SomeValue
JDo,John- Doe - Temp - Client Client Ops  ,SomeValue
JDo,John - Doe - Temp - Client Client Ops  ,SomeValue
JDo,John-Doe-Temp-Client Client Ops,SomeValue
JDo,John -Doe-Temp-Client Client Ops,SomeValue
JDo,John- Doe-Temp-Client Client Ops,SomeValue
JDo,John - Doe-Temp-Client Client Ops,SomeValue
JDo,John-Doe  - Temp  - Client Client Ops,SomeValue
JDo,John -Doe -  Temp -  Client Client Ops,SomeValue
JDo,John- Doe  -  Temp  -  Client Client Ops,SomeValue
JDo,John - Doe - Temp - Client Client Ops,SomeValue

To add the first and last name columns, I am using the following code:

Function FixRxClaimReportAddFirstLastNameColumn {
  Param ($csvFile)

  Write-Host "Adding columns 'First Name' and 'Last Name' to $csvFile"
  Import-Csv $csvFile |
    Select-Object *, @{n='First Name'; e={if ($_.Description) {
        $columnFirstNameValue = $($_.Description -replace '\s+', ' ').split(" ")[0]
        if ($columnFirstNameValue -notlike "*,*" -and $columnFirstNameValue -notmatch '\?' -and $columnFirstNameValue -notlike "*.*" -and $columnFirstNameValue -notlike "*-*") {
          $columnFirstNameValue.Trim()
        } else {
          $columnFirstNameValue2 = $($_.Description -replace '\s+', ' ') -split {$_ -eq "-" -or $_ -eq "- " -or $_ -eq " -" -or $_ -eq " - " -or $_ -eq "," -or $_ -eq ", " -or $_ -eq " ," -or $_ -eq " , " -or $_ -eq "." -or $_ -eq ". " -or $_ -eq " ." -or $_ -eq " . " -or $_ -eq "?" -or $_ -eq "? " -or $_ -eq " ?" -or $_ -eq " ? "}
          $columnFirstNameValue2[0].Trim()
        }
      }}}, @{n='Last Name'; e={if ($_.Description) {
        $columnLastNameValue = $($_.Description -replace '\s+', ' ').split(" ")[1]
        if ($columnLastNameValue -notlike "*,*" -and $columnLastNameValue -notmatch '\?' -and $columnLastNameValue -notlike "*.*" -and $columnLastNameValue -notlike "*-*") {
          $columnLastNameValue.Trim()
        } else {
          $columnLastNameValue2 = $($_.Description -replace '\s+', ' ') -split {$_ -eq "-" -or $_ -eq "- " -or $_ -eq " -" -or $_ -eq " - " -or $_ -eq "," -or $_ -eq ", " -or $_ -eq " ," -or $_ -eq " , " -or $_ -eq "." -or $_ -eq ". " -or $_ -eq " ." -or $_ -eq " . " -or $_ -eq "?" -or $_ -eq "? " -or $_ -eq " ?" -or $_ -eq " ? "}
          $columnLastNameValue2[1].Trim()
        }
      }}} | Export-Csv "$csvFile-Results.csv" -NoTypeInformation -Force
  Write-Host "Complete."
  Write-Host ""
}

FixRxClaimReportAddFirstLastNameColumn 'C:\Scripts\Tests\Test1.csv'

When I run this code, all first name values should be John, and all last name values should be Doe. However, the values are very different for all.

Upvotes: 0

Views: 81

Answers (1)

Ansgar Wiechers
Ansgar Wiechers

Reputation: 200523

You're thinking too complicated. Remove the additional information from the end of the Description field to get just the name, then trim the name and split it into first and last name before adding these as new properties to the input object.

Try this:

Import-Csv 'C:\path\to\input.csv' | ForEach-Object {
  $rawname = $_.Description -replace '-[^-]*-[^-]*$'
  $firstname, $lastname = $rawname.Trim() -split ' *[ \?\.,-] *'
  $_ | Add-Member -Type NoteProperty -Name FirstName -Value $firstname
  $_ | Add-Member -Type NoteProperty -Name LastName -Value $lastname
  $_
} | Export-Csv 'C:\path\to\output.csv' -NoType

Upvotes: 3

Related Questions