Senior Systems Engineer
Senior Systems Engineer

Reputation: 1139

Parsing CSV and then count the occurrence by date?

I need to parse a CSV file like in the attached image, and then generate the summary report based on the frequency of the specific keywords in the message_subject column like 'Auto Reply:'

Source Input.CSV: https://pastebin.com/3iaUMCPd

enter image description here

The expected result is:

Sender_Address,        Frequency, Date (Ascending sorted)
[email protected], 3,         2022-09-27, 2022-09-28
[email protected], 2,         2022-10-04, 2022-09-30
[email protected], 1,         2022-10-06

How to achieve that?

Upvotes: 0

Views: 120

Answers (4)

Mathias R. Jessen
Mathias R. Jessen

Reputation: 174485

Use Group-Object to group the rows by sender address, then use Sort-Object -Unique to grab all unique dates:

Import-Csv path\to\file.csv |Group-Object sender_address |ForEach-Object {
  # Parse timestamps, grab unique dates
  $uniqueDates = $_.Group |ForEach-Object { [datetime]::Parse($_.origin_timestamp_utc).Date.ToString('yyyy-MM-dd') } |Sort-Object -Unique

  # Create a new object to hold the frequency info
  [pscustomobject]@{
    Sender_address = $_.Name
    Frequency = $_.Count
    Dates = $uniqueDates -join ', '
  }
} |Export-Csv path\to\result.csv -NoTypeInformation

Upvotes: 3

yxc
yxc

Reputation: 151

This solves your problem, including ascending sorting for the dates:

# gets all messages starting with "Auto Reply"
$messages = Import-Csv <your file> | Where-Object message_subject -match "Auto Reply.*"  
# changes timestamp format to date only
$messages | ForEach-Object { $_.origin_timestamp_utc = [datetime]::Parse($_.origin_timestamp_utc).Date.ToString('yyyy-MM-dd') } 
#groups by sender_adress
$messages | Group-Object sender_address | ForEach-Object {
    $sortedDates = ""
    # sorts dates so the date with the least amount of messages comes first (you can change this with Sort-Object -descending)
    $_.group | Group-Object origin_timestamp_utc | Sort-Object count | ForEach-Object {
        $sortedDates += $_.Name + ","
    }
    [pscustomobject]@{
      Sender_Address = $_.Name
      Frequency = $_.Count
      "Date (Ascending Sorted)" = $sortedDates.TrimEnd(",") #remove last ","
    }
} 

Add Export-Csv to export it

Upvotes: 1

zett42
zett42

Reputation: 27756

Here is my take on the problem:

# Create sample data
$csv = ConvertFrom-Csv @'
origin_timestamp_utc,sender_address,message_subject,directionality,connector_id,delivery_priority
2022-09-27T19:38:02.1826859Z,[email protected],Customer Vehicle Conditions report for 20220927,Originating,,Normal
2022-09-27T19:38:02.2054022Z,[email protected],Auto Reply: Vehicle Reports are due now.,Originating,,Normal
2022-09-28T19:38:03.0335704Z,[email protected],Auto Reply: Late in the office.,Originating,,Normal
2022-09-28T19:38:03.0629852Z,[email protected],Auto Reply: Sick not well :-(,Originating,,Normal
2022-10-05T19:38:02.1819402Z,[email protected],Customer Vehicle Conditions report for 20221005,Originating,,Normal
2022-10-05T19:38:02.1903258Z,[email protected],Customer Vehicle Conditions report for 20221005,Originating,,Normal
2022-10-06T19:38:02.2743619Z,[email protected],Customer Vehicle Conditions report for 20221006,Originating,,Normal
2022-10-06T19:38:02.2821327Z,[email protected],Customer Vehicle Conditions report for 20221006,Originating,,Normal
2022-09-29T19:38:02.1361255Z,[email protected],Customer Vehicle Conditions report for 20220929,Originating,,Normal
2022-09-29T19:38:02.1453714Z,[email protected],Customer Vehicle Conditions report for 20220929,Originating,,Normal
2022-09-30T19:38:02.8562799Z,[email protected],Auto Reply: Out of office today?!??!,Originating,,Normal
2022-09-30T19:38:02.8979890Z,[email protected],Customer Vehicle Conditions report for 20220930,Originating,,Normal
2022-10-03T19:38:02.5937670Z,[email protected],Customer Vehicle Conditions report for 20221003,Originating,,Normal
2022-10-04T02:33:07.6877938Z,[email protected],Auto Reply: Not in the Office ?,Originating,,Normal
2022-10-04T19:38:02.6117533Z,[email protected],Customer Vehicle Conditions report for 20221004,Originating,,Normal
2022-10-01T19:38:03.3718620Z,[email protected],Customer Vehicle Conditions report for 20221001,Originating,,Normal
2022-10-01T19:38:04.8114745Z,[email protected],Customer Vehicle Conditions report for 20221001,Originating,,Normal
2022-10-02T19:38:02.2517865Z,[email protected],Customer Vehicle Conditions report for 20221002,Originating,,Normal
2022-10-02T19:38:02.2855767Z,[email protected],Customer Vehicle Conditions report for 20221002,Originating,,Normal
2022-10-03T19:38:02.5937670Z,[email protected],Customer Vehicle Conditions report for 20221003,Originating,,Normal
2022-10-04T19:38:02.6383853Z,[email protected],Customer Vehicle Conditions report for 20221004,Originating,,Normal
2022-10-06T23:30:12.3284393Z,[email protected],Auto Reply: Where are you !!!!,Originating,,Normal
'@

$keyWord = 'Auto Reply:*'

$csv | Group-Object sender_address | ForEach-Object {

    # From current group, get all date/times where the message_subject matches the keyword
    $dates = $_.Group.Where{ $_.message_subject -like $keyWord }.ForEach{ $_.origin_timestamp_utc }

    # Remove the time part and get unique dates
    $uniqueDates = $dates -replace 'T.+' | Sort-Object -Unique

    # Output
    [PSCustomObject]@{
        Sender_Address = $_.Name
        Frequency = $dates.Count
        'Date (Ascending sorted)' = $uniqueDates -join ', '
    }
}

Output:

Sender_Address        Frequency Date (Ascending sorted)
--------------        --------- -----------------------
[email protected]         3 2022-09-27, 2022-09-28 
[email protected]         2 2022-09-30, 2022-10-04 
[email protected]         1 2022-10-06

Compared to the expected output, there is a slight difference in the Date column, 2nd row. I have interpreted the "ascending sorted" to be applied to each row individually. I'm not sure if OP actually meant to sort the whole column (possibly by newest date).

Upvotes: 1

Toni
Toni

Reputation: 1816

Here you go, by using group-object to group by Sender_Address and then using Select-object with calculated properties to count the keyWord 'Auto Reply' and gather the related dates (sorted):

#Load csv
$csv = import-csv [path]

#Calculate output
$csv | Group-Object -Property 'sender_address' | Select-Object Name,@{name='keyWordCount';expression={($_.group.'message_subject' -match 'Auto Reply').count}},@{name='origin_timestamp_utc';expression={(($_.group | ?{$_.'message_subject' -match 'Auto Reply'}).'origin_timestamp_utc' | Sort-Object) -join ","}}

#Output
Name                  keyWordCount origin_timestamp_utc
----                  ------------ --------------------
[email protected]            3 2022-09-27T19:38:02.2054022Z,2022-09-28T19:38:03.0335704Z,2022-09-28T19:38:03.06298…
[email protected]            2 2022-09-30T19:38:02.8562799Z,2022-10-04T02:33:07.6877938Z
[email protected]            1 2022-10-06T23:30:12.3284393Z

Upvotes: 1

Related Questions