twoLeftFeet
twoLeftFeet

Reputation: 720

Powershell PG_Dump Script Maxing out RAM

I have a powershell script that loops through our postgres databases and runs pg_dump on them. The script writes a sql dump file. The issues is it uses all my available RAM. I'm wondering if there's a way to optimize this so that doesn't happen.

Powershell script:

$file = "output.csv"
$pguser = "postgres"

# start log file
Start-Transcript -Path "C:\transcripts\pg-backup.$(Get-Date -Format yyyy-MM-dd-hh-mm).transcript.txt"

# get password
Write-Host "Reading password file..."
$password = Get-Content "C:\Scripts\pg_pass.txt"

Write-Host "Password read."
$env:PGPASSWORD = $password

# get database names and put them in a csv
# Name,Size,etc
psql -U $pguser -l -A -F "," > $file

# remove first line 
get-content $file |
    select -Skip 1 |
    set-content "$file-temp"
move "$file-temp" $file -Force

$result = Import-Csv $file
Remove-Item $file

Write-Host "Databases queried: $($result.length)"

# Loop through each database name
# and dump it, upload it, delete it
ForEach($row in $result){
    Write-Host "Processing database $(1 + $result::IndexOf($result, $row)) of $($result.length)"
    $db = $row.Name

    # skip rows that aren't databases
    if(!$db.Contains('/') -and !$db.Contains(')')){

        Write-Host "Backing up: $($db)"
        $dumpfile = "$(Get-Date -Format yyyy-MM-dd-hh-mm).$($db).dump"

        # dump it 
        Write-Host "Creating Dump File: $dumpfile"
        pg_dump -U $pguser -F c $db > $dumpfile
        Write-Host "Dump File Created."

        # s3 it 
        Write-Host "Uploading to S3..."
        aws s3 cp $dumpfile s3://my-s3-bucket
        Write-Host "File Uploaded successfully."

        # delete it
        Write-Host "Removing dumpfile."
        Remove-Item $dumpfile
        Write-Host "File Removed."
    }
}

Stop-Transcript

How I'm running it:

Script:

C:\WINDOWS\system32\WindowsPowerShell\v1.0\powershell.exe

Arguments:

-noprofile -NonInteractive -WindowStyle hidden –file C:\Scripts\pg-backup.ps1

My transcript shows:

**********************
Windows PowerShell transcript start
Start time: 20190904211002
Username: ****
RunAs User: *****
Machine: ***** (Microsoft Windows NT 10.0.14393.0)
Host Application: C:\WINDOWS\system32\WindowsPowerShell\v1.0\powershell.exe -noprofile -NonInteractive -WindowStyle hidden –file C:\Scripts\pg-backup.ps1
Process ID: 5840
PSVersion: 5.1.14393.2636
PSEdition: Desktop
PSCompatibleVersions: 1.0, 2.0, 3.0, 4.0, 5.0, 5.1.14393.2636
BuildVersion: 10.0.14393.2636
CLRVersion: 4.0.30319.42000
WSManStackVersion: 3.0
PSRemotingProtocolVersion: 2.3
SerializationVersion: 1.1.0.1
**********************
Transcript started, output file is C:\transcripts\pg-backup.2019-09-04-09-10.transcript.txt
Reading password file...
Password read.
Databases queried: 85
Processing database 1 of 85
Backing up: my_database
Creating Dump File: 2019-09-04-09-10.my_database.dump

It ends there. Eventually task scheduler kills the process because it hangs so long.

Upvotes: 0

Views: 2463

Answers (2)

twoLeftFeet
twoLeftFeet

Reputation: 720

I found a simple solution. In the PG docs it mentions that by default pg_dump copies the content to standard output. This is what is using all my RAM I think since powershell is probably caching the entire db dump in memory.

It does accept a file parameter that will dump to a file. This prevents the RAM issue since pg_dump is putting the content into the file directly.

-f file
--file=file
Send output to the specified file. If this is omitted, the standard output is used.

Upvotes: 0

Angos
Angos

Reputation: 11

This is a guess, but I think that $file is rather large.

Writing it>reading it to remove one line>writing it>reading it puts it into memory a bunch of times. I would handle it something like this to prevent all the copying of objects:

psql -U $pguser -l -A -F "," | select-object -skip 1 | convertfrom-csv | foreach {
$db = $_.name
...

if you still want the write-host line:

$result = (psql -U $pguser -l -A -F "," | select-object -skip 1 | convertfrom-csv)
Write-Host "Databases queried: $($result.count)"
foreach ($row in $result) {
    $db = $row.name
...

Assigning a variable to a command or piping a command just makes the output of the command the contents of a variable (piping makes it the pipeline variable $_). While I do not use postgresql, I would expect something like this to work. It would prevent multiple copies of the object being created, prevent multiple disk writes and reads and probably help with memory usage.

Upvotes: 1

Related Questions