Reputation: 821
I'm trying to write a part of a script that replaces a match line with RegEx.
Here's what the input looks like:
Name, type, ADDRESSES
“Aaa”, “bbb”, “19 S 149TH $NEWPORT NEWS, WA 96332”
“Aaa”, “bbb”, “851 16TH AVE #365$SALISH, WA 98402-4410”
“Aaa”, “bbb”, “2445 E BROADWAY #204$YELM WA 98653”
Here's what I've tried
$regex = '\d{5}([ \-]\d{4})?'
##get the data
$people = Get-Content 'C:\test.csv'
## let's convert the data first
foreach ($p in $people) {
if ($p -match $regex) { $p | out-file -append C:\test.csv }
}
Here's what I expect back
Name, type, ADDRESSES
“Aaa”, “bbb”, “96332”
“Aaa”, “bbb”, “98402-4410”
“Aaa”, “bbb”, “98653”
Here's what I get back:
Name, type, ADDRESSES
“Aaa”, “bbb”, “19 S 149TH $NEWPORT NEWS, WA 96332”
“Aaa”, “bbb”, “851 16TH AVE #365$SALISH, WA 98402-4410”
“Aaa”, “bbb”, “2445 E BROADWAY #204$YELM WA 98653”
Upvotes: 0
Views: 138
Reputation: 61028
Here's my 2 cents on this:
$csv = Import-Csv -Path 'theOriginal.csv' | ForEach-Object {
$_ | Select-Object *, @{Name = 'ADDRESSES'; Expression = { $_.ADDRESSES -replace '.*\s([-\d]+)$', '$1'}} -ExcludeProperty ADDRESSES
}
# output on screen
$csv
# write to file
$csv | Export-Csv -Path 'theUpdated.csv' -NoTypeInformation
Result:
Name type ADDRESSES
---- ---- ---------
Aaa bbb 96332
Aaa bbb 98402-4410
Aaa bbb 98653
Regex details:
. Match any single character that is not a line break character
* Between zero and unlimited times, as many times as possible, giving back as needed (greedy)
\s Match a single character that is a “whitespace character” (spaces, tabs, line breaks, etc.)
( Match the regular expression below and capture its match into backreference number 1
[-\d] Match a single character present in the list below
The character “-”
A single digit 0..9
+ Between one and unlimited times, as many times as possible, giving back as needed (greedy)
)
$ Assert position at the end of the string (or before the line break at the end of the string, if any)
Upvotes: 0
Reputation: 27418
This works for me. Just replace everything up to 5 digits with only the 5 digits. It still works if there's a 5 digit number in the beginning. https://javascript.info/regexp-greedy-and-lazy
import-csv file.csv |
% { $_.addresses = $_.addresses -replace '.*(\d{5})', '$1'; $_ }
Name type ADDRESSES
---- ---- ---------
Aaa bbb 96332
Aaa bbb 98402-4410
Aaa bbb 98653
Upvotes: 1
Reputation: 5114
To continue from comments, since the csv data is not in good form it might be better to use a different regex and -replace
to modify the data.
$file = 'c:\temp\test.csv'
# add test data to a file
@'
Name, type, ADDRESSES
Aaa, bbb, 19 S 149TH $NEWPORT NEWS, WA 96332
Aaa, bbb, 851 16TH AVE #365$SALISH, WA 98402-4410
Aaa, bbb, 2445 E BROADWAY #204$YELM WA 98653
'@ | Set-Content $file
$regex = ',[ \w$#]+,?[ \w]+(\d{5}(?:\-\d+)?)$'
# This line will read in the file, skipping the header line.
# Then it will perform a replace using the regex above
# substituting whatever is matched with the first matching group (\d{5}(?:\-\d+).
# Finally the lines are appended to the end of the file
(Get-Content $file | Select-Object -Skip 1) -replace $regex, ', $1' | Add-Content -Path $file
# Get-Content to check our file
Get-Content $file
Output
Name, type, ADDRESSES
Aaa, bbb, 19 S 149TH $NEWPORT NEWS, WA 96332
Aaa, bbb, 851 16TH AVE #365$SALISH, WA 98402-4410
Aaa, bbb, 2445 E BROADWAY #204$YELM WA 98653
Aaa, bbb, 96332
Aaa, bbb, 98402-4410
Aaa, bbb, 98653
Upvotes: 1
Reputation: 59782
This should work.
$text = @'
Name, type, ADDRESSES
Aaa, bbb, 19 S 149TH $NEWPORT NEWS, WA 96332
Aaa, bbb, 851 16TH AVE #365$SALISH, WA 98402-4410
Aaa, bbb, 2445 E BROADWAY #204$YELM WA 98653
'@ -split '\r?\n' | Select-Object -Skip 1
$result = $text.ForEach({
$name, $type, $addresses = $_.Split(',',3)
$addresses = [regex]::Matches($addresses, '[\d-]+(?=$)').Value
[pscustomobject]@{
Name = $name
Type = $type
Addresses = $addresses
}
})
Name Type Addresses
---- ---- ---------
Aaa bbb 96332
Aaa bbb 98402-4410
Aaa bbb 98653
Upvotes: 1