Reputation: 29
I have several JSON files I'm attempting to parse through that need some strings cleaned up.
The strings that need to be cleaned are stored in the variable $text with human-readable text.
The strings all look similar to this.
<mailto:[email protected]|[email protected]>
<tel:469-867-5309|469-867-5309>
<tel:6168675309|(616) 867-5309>
There are thousands of these that need to be cleaned up to show emails as
and phone numbers like
469-867-5309
(616) 867-5309
I've made some progress using -replace with several variations of the following however I'm still learning Powershell so I'm certain there is a better way:
-replace 'mailto:',""
-replace 'tel:',""
-replace '[<>]',""
-replace '[|]'," "
The remainder of the data being parsed is human-readable text conversations with the occasional hyperlink. I would obviously like to not affect messages at random and focus strictly on providing the email address and phone numbers as human-readable nonduplicate data.
Example $text below
"text": "*FOR JAMES:* Chase: <tel:8178675309|817-867-5309>. Has talked with you before about his 4 parks. The current one he is wanting insured has. Looking for more coverage than just GL coverage. All he has right now is if were to trip and fall he would be covered, nothing more than that. Any time before noon he is available.",
"text": "New task for Jeremi \nPriority: Low\nDue Date: Aug 19, 2020 01:00 PM\nTask Description: Hey James, Jeremi has been in the “Working” stage of your pipeline for 4 weeks now. Please address the prospect, or delete this task if they are in this phase still for good reason.\n\nEmail for follow up: <mailto:[email protected]|[email protected]>",
Thanks for any help.
Upvotes: 0
Views: 88
Reputation: 8868
I would've loved to use ConvertFrom-String
for this but it requires at least semi structured/consistent data. So regex to the rescue! Here's what I came up with. All your examples show the same data just in different formats. If there is a likelyhood that there will be more than one phone/email that is unique let me know and I'll tweak this.
For this test I used your two sample lines and treated them as completely separate. If I misunderstood please correct me.
$data = @' "text": "FOR JAMES: Chase: tel:8178675309|817-867-5309. Has talked with you before about his 4 parks. The current one he is wanting insured has. Looking for more coverage than just GL coverage. All he has right now is if were to trip and fall he would be covered, nothing more than that. Any time before noon he is available.",
$data = @'
"text": "*FOR JAMES:* Chase: <tel:8178675309|817-867-5309>. Has talked with you before about his 4 parks. The current one he is wanting insured has. Looking for more coverage than just GL coverage. All he has right now is if were to trip and fall he would be covered, nothing more than that. Any time before noon he is available.",
"text": "New task for Jeremi \nPriority: Low\nDue Date: Aug 19, 2020 01:00 PM\nTask Description: Hey James, Jeremi has been in the “Working” stage of your pipeline for 4 weeks now. Please address the prospect, or delete this task if they are in this phase still for good reason.\n\nEmail for follow up: <mailto:[email protected]|[email protected]>",
'@ -split "`n"
$data | foreach {
switch -Regex ($_) {
'<mailto:(\S+)\|(\S+)>' {$email = 1..2 | foreach {$matches.$_} | select -Unique}
'<tel:(\S+)\|(\S+)>' {$phone = 1..2 | foreach {$matches.$_ -replace '-()'} | select -Unique}
}
if($phone -or $email)
{
[PSCustomObject]@{
Email = $email
Phone = $phone
}
}
Remove-Variable email,phone -ErrorAction SilentlyContinue
}
Output
Email Phone
----- -----
8178675309
[email protected]
It will remove any formatting for the phone number. If you want to change to a consistent format you can do something like this.
$data | foreach {
switch -Regex ($_) {
'<mailto:(\S+)\|(\S+)>' {$email = 1..2 | foreach {$matches.$_} | select -Unique}
'<tel:(\S+)\|(\S+)>' {$phone = 1..2 | foreach {$matches.$_ -replace '-()'} | select -Unique}
}
if($phone -or $email)
{
[PSCustomObject]@{
Email = $email
Phone = if($phone){$phone.Insert(3,'-').Insert(7,'-')}
}
}
Remove-Variable email,phone -ErrorAction SilentlyContinue
}
Email Phone
----- -----
817-867-5309
[email protected]
Upvotes: 1