d0lph1n
d0lph1n

Reputation: 152

reading an excel file, checking that format and values are correct

I am doing a test here on reading an excel file, checking the format and values of the cells.

I need to check these things specifically:

Whats the correct approach here?

Now Im making functions for checking Integers like:

Int(sheet.Cells(row, col)) = sheet.Cells(row, col)     

This will check the integer values, but whats the best way of checking all the other things? I've heard about a validator object I can use.

Anyone have any tricks on this subject?

Upvotes: 2

Views: 3292

Answers (2)

knb
knb

Reputation: 9303

If you need to parse an external file, here is a little perl script (untested from the top off my head).

use Regexp::Common;
use Test::More;
#use Spreadsheet::ParseExcel; if using excel <=2003

use Spreadsheet::XLSX;


my $excel = Spreadsheet::XLSX -> new ('test.xlsx');

foreach my $sheet (@{$excel -> {Worksheet}}) {


    printf("Sheet: %s\n", $sheet->{Name});

    $sheet -> {MaxRow} ||= $sheet -> {MinRow};

     foreach my $row ($sheet -> {MinRow} .. $sheet -> {MaxRow}) {

            $sheet -> {MaxCol} ||= $sheet -> {MinCol};

            foreach my $col ($sheet -> {MinCol} ..  $sheet -> {MaxCol}) {

                    my $cell = $sheet -> {Cells} [$row] [$col];

                    if ($cell) {
                     if ($col == 0){ #colA
                         $cell =~ qr/$RE{num}{int}/ or fail "Value '$cell' in cell($row, $col) is not an int";
                     }
                     if ($col == 1){ #colB


                         int($cell) or fail "Value '$cell' in cell($row, $col) cannot be parsed to an int";
                         $cell =~ /\d{4}/ or fail "Value '$cell' in cell($row, $col) does not consist of 4 digits"; # must they be consecutive?


                     }
                     if ($col == 3){ #D
                          my  $cellC = $sheet -> {Cells} [$row] [$col - 1]
                          if ($cellC == 1){
                             $cell =~ qr/$RE{num}{int}/ or fail "Value '$cell' in cell($row, $col) is not an int although ColC is 1 ";
                          } 

                     }
                     # one more test for colE
                    }

            }
    }
}
done_testing();

For the colE test, you have to find your own patterns / regular expressions, but that should not be too difficult. For checking time values in perl, see here: http://metacpan.org/pod/Regexp::Common::time I have not used this module, though.

See also http://metacpan.org/pod/Regexp::Common

In VBA you have to load a Microsoft Regular Expression type library (which comes pre-installed on windows). The whole VBA code will be similar to my pseudocode above, but more verbose.

Upvotes: 3

Rachel Hettinger
Rachel Hettinger

Reputation: 8442

This code validates the ActiveSheet per your rules. For Column B, I understood you to mean that the value is actually text, not that the cell is formatted with leading zeros.

Sub Validate()
    Dim lRow As Long
    Dim lNumRows As Long
    Dim bRowValid As Boolean
    Dim bSheetValid As Boolean

    With ActiveSheet
        bSheetValid = True    ' initial assumption is sheet is valid
        lNumRows = .Cells(.Rows.Count, 1).End(xlUp).Row
        For lRow = 2 To lNumRows
            bRowValid = IsInteger(.Cells(lRow, 1).Value)
            bRowValid = bRowValid And IsFormatted(.Cells(lRow, 2).Value)
            If .Cells(lRow, 3).Value = 1 Then
                bRowValid = bRowValid And IsInteger(.Cells(lRow, 4).Value)
            End If
            bRowValid = bRowValid And IsTime(.Cells(lRow, 5).Value)
            bSheetValid = bSheetValid And bRowValid
            If Not bRowValid Then
                ' do something here if you want to flag this row
            End If
        Next lRow
    End With

    If bSheetValid Then
        ' copy data to historical sheet
    End If
End Sub
Function IsInteger(vValue As Variant) As Boolean
    If VarType(vValue) = vbDouble Then
        IsInteger = (Fix(vValue) = vValue)
    Else
        IsInteger = False
    End If
End Function
Function IsFormatted(vValue As Variant) As Boolean
    If VarType(vValue) = vbString Or VarType(vValue) = vbDouble Then
        IsFormatted = vValue Like "[0-9][0-9][0-9][0-9]"
    Else
        IsFormatted = False
    End If
End Function
Function IsTime(vValue As Variant) As Boolean
    If IsFormatted(vValue) Then
       IsTime = IsDate(Left$(vValue, 2) & ":" & Right$(vValue, 2))
    Else
        IsTime = False
    End If
End Function

Here are some alterations you might want to consider:

  • Change the For...Loop to a Do...Loop so that you can stop as soon as invalid data is found. Do this if you don't care to know which rows were invalid.
  • Add highlighting for invalid data if you want to see the errors. Do this in the If Not bRowValid... block.
  • Change Sub Validate to a function that takes the worksheet as a parameter and returns a boolean. IOW, separate the validation from the code that copies data to the historical sheet.

Upvotes: 4

Related Questions