Ralph M. Rivera
Ralph M. Rivera

Reputation: 789

Excluding Content From SQL Bulk Insert

I want to import my IIS logs into SQL for reporting using Bulk Insert, but the comment lines - the ones that start with a # - cause a problem becasue those lines do not have the same number f fields as the data lines.

If I manually deleted the comments, I can perform a bulk insert.

Is there a way to perform a bulk insert while excluding lines based on a match such as : any line that beings with a "#".

Thanks.

Upvotes: 1

Views: 1936

Answers (4)

maomoa
maomoa

Reputation: 11

Following up on what PeterX wrote, I modified the application to handle large log files since anything sufficiently large would create an out-of-memory exception. Also, since we're only interested in whether or not the first character of a line starts with a hash, we can just use StartsWith() method on the read operation.

class Program
{
    static void Main(string[] args)
    {
        if (args.Length == 2)
        {
            string path = args[0];
            string outPath = args[1];
            string line;

            foreach (string file in Directory.GetFiles(path, "*.log"))
            {
                using (StreamReader sr = new StreamReader(file))
                {
                    using (StreamWriter sw = new StreamWriter(Path.Combine(outPath, new FileInfo(file).Name), false))
                    {
                        while ((line = sr.ReadLine()) != null)
                        {
                            if(!line.StartsWith("#"))
                            {
                                sw.WriteLine(line);
                            }
                        }                          
                    }
                }

            }
        }
        else
        {
            Console.WriteLine("Source and Destination Log Path required or too many arguments");
        }
    }
}

Upvotes: 1

PeterX
PeterX

Reputation: 2901

Microsoft has a tool called "PrepWebLog" http://support.microsoft.com/kb/296093 - which strips-out these hash/pound characters, however I'm running it now (using a PowerShell script for multiple files) and am finding its performance intolerably slow.

I think it'd be faster if I wrote a C# program (or maybe even a macro).


Update: PrepWebLog just crashed on me. I'd avoid it.


Update #2, I looked at PowerShell's Get-Content and Set-Content commands but didn't like the syntax and possible performance. So I wrote this little C# console app:

        if (args.Length == 2)
        {
            string path = args[0];
            string outPath = args[1];

            Regex hashString = new Regex("^#.+\r\n", RegexOptions.Multiline | RegexOptions.Compiled);
            foreach (string file in Directory.GetFiles(path, "*.log"))
            {
                string data;
                using (StreamReader sr = new StreamReader(file))
                {
                    data = sr.ReadToEnd();
                }

                string output = hashString.Replace(data, string.Empty);
                using (StreamWriter sw = new StreamWriter(Path.Combine(outPath, new FileInfo(file).Name), false))
                {
                    sw.Write(output);
                }
            }
        }
        else
        {
            Console.WriteLine("Source and Destination Log Path required or too many arguments");
        }

It's pretty quick.


Upvotes: 0

Remus Rusanu
Remus Rusanu

Reputation: 294387

I recommend using logparser.exe instead. LogParser has some pretty neat capabilities on its own, but it can also be used to format the IIS log to be properly imported by SQL Server.

Upvotes: 1

mikurski
mikurski

Reputation: 1363

The approach I generally use with BULK INSERT and irregular data is to push the incoming data into a temporary staging table with a single VARCHAR(MAX) column.

Once it's in there, I can use more flexible decision-making tools like SQL queries and string functions to decide which rows I want to select out of the staging table and bring into my main tables. This is also helpful because BULK INSERT can be maddeningly cryptic about the why and how of why it fails on a specific file.

The only other option I can think of is using pre-upload scripting to trim comments and other lines that don't fit your tabular criteria before you do your bulk insert.

Upvotes: 2

Related Questions