Reputation: 1139
I need to parse a CSV file like in the attached image, and then generate the summary report based on the frequency of the specific keywords in the message_subject column like 'Auto Reply:'
Source Input.CSV: https://pastebin.com/3iaUMCPd
The expected result is:
Sender_Address, Frequency, Date (Ascending sorted)
[email protected], 3, 2022-09-27, 2022-09-28
[email protected], 2, 2022-10-04, 2022-09-30
[email protected], 1, 2022-10-06
How to achieve that?
Upvotes: 0
Views: 120
Reputation: 174485
Use Group-Object
to group the rows by sender address, then use Sort-Object -Unique
to grab all unique dates:
Import-Csv path\to\file.csv |Group-Object sender_address |ForEach-Object {
# Parse timestamps, grab unique dates
$uniqueDates = $_.Group |ForEach-Object { [datetime]::Parse($_.origin_timestamp_utc).Date.ToString('yyyy-MM-dd') } |Sort-Object -Unique
# Create a new object to hold the frequency info
[pscustomobject]@{
Sender_address = $_.Name
Frequency = $_.Count
Dates = $uniqueDates -join ', '
}
} |Export-Csv path\to\result.csv -NoTypeInformation
Upvotes: 3
Reputation: 151
This solves your problem, including ascending sorting for the dates:
# gets all messages starting with "Auto Reply"
$messages = Import-Csv <your file> | Where-Object message_subject -match "Auto Reply.*"
# changes timestamp format to date only
$messages | ForEach-Object { $_.origin_timestamp_utc = [datetime]::Parse($_.origin_timestamp_utc).Date.ToString('yyyy-MM-dd') }
#groups by sender_adress
$messages | Group-Object sender_address | ForEach-Object {
$sortedDates = ""
# sorts dates so the date with the least amount of messages comes first (you can change this with Sort-Object -descending)
$_.group | Group-Object origin_timestamp_utc | Sort-Object count | ForEach-Object {
$sortedDates += $_.Name + ","
}
[pscustomobject]@{
Sender_Address = $_.Name
Frequency = $_.Count
"Date (Ascending Sorted)" = $sortedDates.TrimEnd(",") #remove last ","
}
}
Add Export-Csv
to export it
Upvotes: 1
Reputation: 27756
Here is my take on the problem:
# Create sample data
$csv = ConvertFrom-Csv @'
origin_timestamp_utc,sender_address,message_subject,directionality,connector_id,delivery_priority
2022-09-27T19:38:02.1826859Z,[email protected],Customer Vehicle Conditions report for 20220927,Originating,,Normal
2022-09-27T19:38:02.2054022Z,[email protected],Auto Reply: Vehicle Reports are due now.,Originating,,Normal
2022-09-28T19:38:03.0335704Z,[email protected],Auto Reply: Late in the office.,Originating,,Normal
2022-09-28T19:38:03.0629852Z,[email protected],Auto Reply: Sick not well :-(,Originating,,Normal
2022-10-05T19:38:02.1819402Z,[email protected],Customer Vehicle Conditions report for 20221005,Originating,,Normal
2022-10-05T19:38:02.1903258Z,[email protected],Customer Vehicle Conditions report for 20221005,Originating,,Normal
2022-10-06T19:38:02.2743619Z,[email protected],Customer Vehicle Conditions report for 20221006,Originating,,Normal
2022-10-06T19:38:02.2821327Z,[email protected],Customer Vehicle Conditions report for 20221006,Originating,,Normal
2022-09-29T19:38:02.1361255Z,[email protected],Customer Vehicle Conditions report for 20220929,Originating,,Normal
2022-09-29T19:38:02.1453714Z,[email protected],Customer Vehicle Conditions report for 20220929,Originating,,Normal
2022-09-30T19:38:02.8562799Z,[email protected],Auto Reply: Out of office today?!??!,Originating,,Normal
2022-09-30T19:38:02.8979890Z,[email protected],Customer Vehicle Conditions report for 20220930,Originating,,Normal
2022-10-03T19:38:02.5937670Z,[email protected],Customer Vehicle Conditions report for 20221003,Originating,,Normal
2022-10-04T02:33:07.6877938Z,[email protected],Auto Reply: Not in the Office ?,Originating,,Normal
2022-10-04T19:38:02.6117533Z,[email protected],Customer Vehicle Conditions report for 20221004,Originating,,Normal
2022-10-01T19:38:03.3718620Z,[email protected],Customer Vehicle Conditions report for 20221001,Originating,,Normal
2022-10-01T19:38:04.8114745Z,[email protected],Customer Vehicle Conditions report for 20221001,Originating,,Normal
2022-10-02T19:38:02.2517865Z,[email protected],Customer Vehicle Conditions report for 20221002,Originating,,Normal
2022-10-02T19:38:02.2855767Z,[email protected],Customer Vehicle Conditions report for 20221002,Originating,,Normal
2022-10-03T19:38:02.5937670Z,[email protected],Customer Vehicle Conditions report for 20221003,Originating,,Normal
2022-10-04T19:38:02.6383853Z,[email protected],Customer Vehicle Conditions report for 20221004,Originating,,Normal
2022-10-06T23:30:12.3284393Z,[email protected],Auto Reply: Where are you !!!!,Originating,,Normal
'@
$keyWord = 'Auto Reply:*'
$csv | Group-Object sender_address | ForEach-Object {
# From current group, get all date/times where the message_subject matches the keyword
$dates = $_.Group.Where{ $_.message_subject -like $keyWord }.ForEach{ $_.origin_timestamp_utc }
# Remove the time part and get unique dates
$uniqueDates = $dates -replace 'T.+' | Sort-Object -Unique
# Output
[PSCustomObject]@{
Sender_Address = $_.Name
Frequency = $dates.Count
'Date (Ascending sorted)' = $uniqueDates -join ', '
}
}
Output:
Sender_Address Frequency Date (Ascending sorted)
-------------- --------- -----------------------
[email protected] 3 2022-09-27, 2022-09-28
[email protected] 2 2022-09-30, 2022-10-04
[email protected] 1 2022-10-06
Compared to the expected output, there is a slight difference in the Date
column, 2nd row. I have interpreted the "ascending sorted" to be applied to each row individually. I'm not sure if OP actually meant to sort the whole column (possibly by newest date).
Upvotes: 1
Reputation: 1816
Here you go, by using group-object
to group by Sender_Address and then using Select-object
with calculated properties to count the keyWord 'Auto Reply' and gather the related dates (sorted):
#Load csv
$csv = import-csv [path]
#Calculate output
$csv | Group-Object -Property 'sender_address' | Select-Object Name,@{name='keyWordCount';expression={($_.group.'message_subject' -match 'Auto Reply').count}},@{name='origin_timestamp_utc';expression={(($_.group | ?{$_.'message_subject' -match 'Auto Reply'}).'origin_timestamp_utc' | Sort-Object) -join ","}}
#Output
Name keyWordCount origin_timestamp_utc
---- ------------ --------------------
[email protected] 3 2022-09-27T19:38:02.2054022Z,2022-09-28T19:38:03.0335704Z,2022-09-28T19:38:03.06298…
[email protected] 2 2022-09-30T19:38:02.8562799Z,2022-10-04T02:33:07.6877938Z
[email protected] 1 2022-10-06T23:30:12.3284393Z
Upvotes: 1