Reputation: 15
I download a .csv
file daily which I need to clean up a bit before using it in another software.
The file have data about controls conducted at restaurants, where new controls results in a new row in the file, which means each restaurant can have several entries - I only need the newest.
Here's a portion of the .csv
file(only included 4 of the columns):
Headers
orgnummer;navn;dato;total_karakter
4 random rows
985129576;Økern Sushi;21092016;1
785423684;Å cafe;09072017;2
458792365;Varangerkroa;01012018;0
985129576;Økern Sushi;05052018;0
orgnummer
and navn
will be unique for each restaurant.
In my example the script should remove the top entry of Økern Sushi.
My script so far:
$temp = Import-Csv 'C:\Users\Downloads\tilsynPS.csv' -Delimiter ';'
#change date format to desired format
foreach($row in $temp) {
$year = $row.dato.Substring($row.dato.Length - 4, 4)
$month = $row.dato.Substring($row.dato.Length - 6, 2)
$day = $row.dato.Substring(0, $row.dato.Length - 6)
$date = New-Object System.DateTime $year,$month,$day
$row.dato = $date
}
#Here's my attempt at sorting and deleting old records:
sort orgnummer, dato -Descending
$temp[0]
for ($i=1; $i -le $temp.length -1; $i++) {
if ($temp[$i]."orgnummer" -eq $temp[$i-1]."orgnummer"){
continue
}
else {$temp[$i]}
}
#export to csv
$temp | Export-Csv -Encoding UTF8 -NoTypeInformation -path C:\Users\Downloads\tilsynPS_redigert.csv
The attempt at removing old records is based on this post: http://community.idera.com/powershell/ask_the_experts/f/powershell_for_windows-12/8073/csv-remove-unique-records-based-on-columns-and-last-date
have also tried with this version of the code:
sort -Property @{Expression="dato";Descending=$true},
@{Expression="navn";Descending=$false}
$temp[0]
for ($i=1; $i -le $temp.length -1; $i++) {
if ($temp[$i]."navn" -eq $temp[$i-1]."navn"){
continue
}
else {$temp[$i]}
}
This is my first script in PowerShell, any help is much appreciated. Have noticed similar questions, but none that are in PowerShell which is what I'm working with here.
Upvotes: 1
Views: 811
Reputation: 61068
I think you almost had it. This solution accumulates the results in a new array and exports that. Rows are only added if the orgnummer is different or the date they hold is newer.
Please note that for testing I had to change the path to the import and export files.
$temp = Import-Csv "$PSScriptRoot\tilsynPS.csv" -Delimiter ';'
#change date format to desired format
foreach($row in $temp) {
$year = $row.dato.Substring($row.dato.Length - 4, 4)
$month = $row.dato.Substring($row.dato.Length - 6, 2)
$day = $row.dato.Substring(0, $row.dato.Length - 6)
$row.dato = New-Object System.DateTime $year,$month,$day
}
# sort on orgnummer and date descending:
$temp = $temp | Sort-Object orgnummer, dato -Descending
# create a new array for the output and add the first entry in it already
$newData = @($temp[0])
# for all other rows in the array, check if they should be added or not
for ($i = 1; $i -le $temp.Length -1; $i++) {
if ($temp[$i]."orgnummer" -eq $temp[$i-1]."orgnummer" -and $temp[$i]."dato" -le $temp[$i-1]."dato") {
continue
}
else {
$newData += $temp[$i]
}
}
#export to csv
$newData | Export-Csv -Encoding UTF8 -NoTypeInformation -path "$PSScriptRoot\tilsynPS_redigert.csv"
Upvotes: 1