Reputation: 11
I have a linux server that will be generating several files throughout the day that need to be inserted in to a database; using Putty I can sftp them off to a server running SQL 2008. Problem is is the structure of the file itself, it has a string of text that are to be placed in different columns, but bulk insert in sql tries to put it all in to one column instead of six. Powershell may not be the best method, but I have seen on several sites how it can find and replace or append to the end of the line, can it count and insert?
So the file looks like this: '18240087A +17135555555 3333333333', where 18, 24, 00, 87, A are different columns, then there is a blank space between the A and the +, that is character count 10-19 which is another column, then characters 20-30 are a column, characters 31-36 are a space which is new column and so on. So I want to insert a '|' or a ',' so that sql understands where the columns end. Is this possible for PowerShell to count randomly?
This may not be the way to respond to all who did answer, i apologize in advance. As this is my first PowerShell script, I appreciate the input from each of you. This is an Avaya SIP server that is generating CDR records, which I must pull from the server and insert in to SQL for later reports. The file exported looks like this:
18:47 10/15
18470214A +14434444444 3013777777 CME-SBC HHHH-CM 4 M00 0
At first I just thought to delete the first line and run a script against the output, which I modified from Kieranties post:
$test = Get-Content C:\Share\CDR\testCDR.txt
$pattern = "^(.{2})(.{2})(.{1})(.{2})(.{1})(.{1})\s*(.{15})(.{10})\s*(.{7})\s*(.{7})\s*(.{1})\s*(.{1})(.{1})(.{1})\s*(.*)$"
if($test -match $pattern){ $result = $matches.Values | select -first ($matches.Count-1)
[array]::Reverse($result, 0, $result.Length)
$result = $result -join "|"
$result | Out-File c:\Share\CDR\results1.txt
}
But then i realized I need that first line as it contains the date. I can try to work that out another way though.
I also now see that there are times when the file contains 2 or more lines of CDR info, such as:
18:24 10/15
18240087A +14434444444 3013777777 CME-SBC HRSA-CM 4 M00 0
18240096A +14434444445 3013777778 CME-SBC HRSA-CM 4 M00 0
Whereas the .ps1 file I made does not give the second string, so I tried adding in this:
foreach ($Data in $test) { $Data = $Data -split(',')
and it fails to run. How can I do multiple lines (and possibly that first line)? If you know of a tutorial that can help, that's greatly appreciated as well!
Upvotes: 1
Views: 3575
Reputation: 11
PowerShell is a great tool that I love and it can do many things. I see that you are using SQL Server 2008. Depending on the edition of SQL Server you have running on the server, it most likely has SQL Server Integration Services (SSIS), which is an Extract, Transform, and Load (ETL) tool designed to help migrate data in many scenarios, such as yours. The file you describe here is sounds like a fixed width file, which SSIS can easily handle and import and SQL Server has great ways to automate the loads if this is a recurring need (Which it sounds like), including the automation of the sftp task, and even running PowerShell scripts as part of the ETL (I've done that several times).
If your file truly is fixed width and you want to use PowerShell to transform it into a delimited file, the regex approach you have in your answer works well, or there are several approaches using the System.String methods, like .insert() which allows you to insert a delimiter character using a character index in your line (use Get-Content to read the file and create one String object per line, then loop through them using Foreach loop or Foreach-Object and the pipeline). A slightly more difficult approach would be to use the .Substring() method. You could build your new String line using Substring to extract each column and concatenating those values with a delimiter. That's probably a lot for someone new to PowerShell, but one of the best ways to learn and gain proficiency with it is to practice writing the same script multiple ways. You can learn new techniques that may solve other problems you might encounter in the future.
Upvotes: 1
Reputation: 747
I've improved my answer based on your response (note, it's probably best you update your actual question to include that information!)
The nice thing about Get-Content
in Powershell is that it returns the content as an array split on the end of line characters. Couple that with allowing multiple assignment from an array and you end up with some neat code.
The following has a function to process each line based on your modified version of my original answer. It's then wrapped by a function which processes the file.
This reads the given file, setting the first line to $date
and the rest of the content to $content
. It then creates an output file adds the date to the output, then loops over the rest of the content performing the regex check and adding the parsed version of the content if the check is successful.
Function Parse-CDRFileLine {
Param(
[string]$line
)
$pattern = "^(.{2})(.{2})(.{1})(.{2})(.{1})(.{1})\s*(.{15})(.{10})\s*(.{7})\s*(.{7})\s*(.{1})\s*(.{1})(.{1})(.{1})\s*(.*)$"
if($line -match $pattern){
$result = $matches.Values | select -first ($matches.Count-1)
[array]::Reverse($result, 0, $result.Length)
$result = $result -join "|"
$result
}
}
Function Parse-CDRFile{
Param(
[string]$filepath
)
# Read content, setting first line to $date, the rest to $content
$date,$content = Get-Content $filepath
# Create the output file, overwrite if neccessary
$outputFile = New-Item "$filepath.out" -ItemType file -Force
# Add the date line
Set-Content $outputFile $date
# Process the rest of the content
$content |
? { -not([string]::IsNullOrEmpty($_)) } |
% { Add-Content $outputFile (Parse-CDRFileLine $_) }
}
Parse-CDRFile "C:\input.txt"
I used your sample input and the result I get is:
18:24 10/15
18|24|0|08|7|A|+14434444444 30|13777777 C|ME-SBC |HRSA-CM|4|M|0|0|0
18|24|0|09|6|A|+14434444445 30|13777778 C|ME-SBC |HRSA-CM|4|M|0|0|0
There are an incredible amount of resources out there but one I particularly suggest is Douglas Finkes Powershell for Developers It's short, concise and full of great info that will get you thinking in the right mindset with Powershell
Upvotes: 0
Reputation: 24071
I don't quite follow the splitting rules. What kind of software writes the text file anyway? Maybe it can be instructed to change the structure?
That being said, inserting pipes is easy enough with .Insert()
$a= '18240087A +17135555555 3333333333'
$a.Substring(0, $a.IndexOf('+')).Insert(2, '|').insert(5,'|').insert(8, '|').insert(11, '|').insert(13, '|')
# Output: 18|24|00|87|A|
# Rest of the line:
$a.Substring($a.IndexOf('+')+1)
# Output: 17135555555 3333333333
From there you can proceed to splitting the rest of the row data.
Upvotes: 0
Reputation: 60918
This is a way (really ugly IMO, I think it can better done):
$a = '18240087A +17135555555 3333333333'
$b = @( ($a[0..1] -join ''), ($a[2..3] -join ''), ($a[4..5] -join ''),
($a[6..7] -join ''), ($a[8] -join ''), ($A[10..19] -join ''),
($a[20..30] -join ''), ($a[31..36] -join ''))
$c = $b -join '|'
$c
18|24|00|87|A|+171355555|55 33333333|33
I don't know if is the rigth splitting you need, but changing the values in each [x..y]
you can do what better fit your need. Remenber that character array are 0-based, then the first char is 0 and so on.
Upvotes: 0