SKPS
SKPS

Reputation: 5835

Case insensitive string comparisons in Fortran

Though Fortran is case insensitive, I observe the same is not the case when strings are compared using if check. Of course, the user should be aware of this.

However, Fortran being case insensitive, subconsciously I expect the same to apply for string checks as well. What's the efficient way to achieve case insensitive string comparison?

Upvotes: 2

Views: 2028

Answers (3)

This is a problem that is commonly solved in many programming languages, see, e.g., Case insensitive string comparison How to do case insensitive string comparison? Case-insensitive string comparison in C++ How can I do a case insensitive string comparison?

The usual way is the same, convert your strings all to lowercase or all to uppercase and compare these converted strings. Some languages have library functions, in some way you have to do it yourselves. If you are staying with ASCII characters, you can just subtract or add 32 to small case or to upper case characters respectively. I you are using some extended character sets like Unicode, you will need some library that can do the conversion for you (e.g., for converting from Ď to ď, or from Щ to щ,these really should be left to a specific library and they will appear in real world data string processing).


As to the subconscious that strings might be case insensitive in Fortran - there is no support for the standard for that and it would also be very impractical and difficult to do in general (e.g., for things like UTF8). It is much easier to add the in-sensitivity as an additional feature to your program rather than the other way,

Upvotes: 2

High Performance Mark
High Performance Mark

Reputation: 78344

Here's another approach which might appeal, let the compiler figure out the character codes and do all the arithmetic:

MODULE strings

  IMPLICIT NONE
  PRIVATE
  PUBLIC :: to_upper
  
  CHARACTER(len=26), PARAMETER :: uca = 'ABCDEFGHIJKLMNOPQRSTUVWXYZ'
  CHARACTER(len=26), PARAMETER :: lca = 'abcdefghijklmnopqrstuvwxyz'
    
CONTAINS

  PURE CHARACTER FUNCTION to_upper_ch(ch)
    CHARACTER, INTENT(in) :: ch
    INTEGER :: ix
    to_upper_ch = ch
    ix = INDEX(lca,ch)
    IF (ix /= 0) to_upper_ch = uca(ix:ix)
  END FUNCTION to_upper_ch

  PURE FUNCTION to_upper(str) RESULT(rslt)
    CHARACTER(len=*), INTENT(in) :: str
    CHARACTER(len=len(str)) :: rslt
    INTEGER :: ix
    DO ix = 1, LEN(str)
       rslt(ix:ix) = to_upper_ch(str(ix:ix))
    END DO    
  END FUNCTION to_upper

END MODULE strings

... and if you're thinking this is a bit laborious, two functions to convert a string to upper case, you're probably right, but this is a selection from a strings module I wrote where it makes sense to structure the code this way. As to this approach's efficiency compared with @King's code, test and measure if it matters to you.

Upvotes: 4

Scientist
Scientist

Reputation: 1835

As @Vladimir says in his answer, string case-sensitvity does matter in Fortran for string comparison. Moreover, trailing blanks are ignored in string comparison in Fortran, that is, "SKPS"=="SKPS " evaluates to .true. (the same does NOT hold for leading blanks). Here is an efficient implementation of what you need:

module String_mod

    implicit none
    public

contains

    pure function getLowerCase(string) result(output)
        ! convert string to lower-case
        character(*), intent(in)    :: string
        integer, parameter          :: DUC = ichar('A') - ichar('a')
        character(len(string))      :: output
        character                   :: ch
        integer                     :: i
        do i = 1,len(string)
            ch = string(i:i)
            if (ch>='A' .and. ch<='Z') ch = char(ichar(ch)-DUC)
            output(i:i) = ch
        end do
    end function getLowerCase

    pure function getUpperCase(string) result(output)
        ! convert string to upper-case
        character(*), intent(in)    :: string
        integer, parameter          :: DUC = ichar('A') - ichar('a')
        character(len(string))      :: output
        character                   :: ch
        integer                     :: i
        do i = 1,len(string)
            ch = string(i:i)
            if (ch>='a' .and. ch<='z') ch = char(ichar(ch)+DUC)
            output(i:i) = ch
        end do
    end function getUpperCase

end module String_mod

program stringComparison

    use, intrinsic :: iso_fortran_env, only: output_unit
    use String_mod, only: getLowerCase
    implicit none
    character(:), allocatable :: thisString, thatString

    thisString = "STACKOVERFLOW"
    thatString = "StackOverflow"

    if (getLowerCase(thisString)==getLowerCase(thatString)) then
        write(output_unit,"(*(g0,:,' '))") "Ignoring case-sensitivity, the strings are the same:" &
                                         , '"'//thisString//'"', "==", '"'//thatString//'"'
    else
        write(output_unit,"(*(g0,:,' '))") "strings are NOT the same:", '"'//thisString//'"', "/=", '"'//thatString//'"'
    end if

    ! Be mindful of trailing white-space characters. They do not matter in string comparison.

    thisString = "STACKOVERFLOW"
    thatString = "StackOverflow  "

    if (getLowerCase(thisString)==getLowerCase(thatString)) then
        write(output_unit,"(*(g0,:,' '))") "Ignoring case-sensitivity, the strings are the same:" &
                                         , '"'//thisString//'"', "==", '"'//thatString//'"'
    else
        write(output_unit,"(*(g0,:,' '))") "strings are NOT the same:", '"'//thisString//'"', "/=", '"'//thatString//'"'
    end if

    ! Be mindful of leading white-space characters. They are important in string comparison.

    thisString = "STACKOVERFLOW"
    thatString = "  StackOverflow"

    if (getLowerCase(thisString)==getLowerCase(thatString)) then
        write(output_unit,"(*(g0,:,' '))") "Ignoring case-sensitivity, the strings are the same:" &
                                         , '"'//thisString//'"', "==", '"'//thatString//'"'
    else
        write(output_unit,"(*(g0,:,' '))") "strings are NOT the same:", '"'//thisString//'"', "/=", '"'//thatString//'"'
    end if

end program stringComparison

Compiling and running the above code gives:

Ignoring case-sensitivity, the strings are the same: "STACKOVERFLOW" == "StackOverflow"
Ignoring case-sensitivity, the strings are the same: "STACKOVERFLOW" == "StackOverflow  "
strings are NOT the same: "STACKOVERFLOW" /= "  StackOverflow"

If this is a performance-critical section of the code, you could likely achieve even better performance to by inlining the getLowerCase() function and modifying it to directly compare the ASCII character codes of the letters. But is it really worth it? It depends on the specifics of your problem.

You can test the above code here: https://www.tutorialspoint.com/compile_fortran_online.php

Upvotes: 2

Related Questions