flavinsky
flavinsky

Reputation: 319

Julia: How to find the longest word in a given string?

I am very new in Julia, I got this challenge from the web:

How can I find the longest word in a given string?

I would like to build a function which would allow to obtain the longest string, even in cases where punctuation is used.

I was trying to to the following code:

function LongestWord(sen::String)
sentence =maximum(length(split(sen, "")))
word= [(x, length(x)) for x in split(sen, " ")]
return((word))
end

LongestWord("Hello, how are you? nested, punctuation?")

But I haven't manage to find the solution.

Upvotes: 1

Views: 935

Answers (3)

Mark Birtwistle
Mark Birtwistle

Reputation: 372

My version specifically defines what symbols are allowable (in this case letters, numbers and spaces):

ALLOWED_SYMBOLS = "ABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890 \t\n"

function get_longest_word(text::String)::String
    letters = Vector{Char}()
    for symbol in text
        if uppercase(symbol) in ALLOWED_SYMBOLS
            push!(letters, symbol)
        end
    end
    words = split(join(letters))
    return words[indmax(length.(words))]
end

@time get_longest_word("Hello, how are you? nested, punctuation?")

"punctuation"

I doubt it's the most efficient code in the world, but it pulls 'ANTIDISESTABLISHMENTARIANISM' out of a 45,000-word dictionary in about 0.1 seconds. Of course, it won't tell me if there is more than one word of the maximum length! That's a problem for another day...

Upvotes: 0

niczky12
niczky12

Reputation: 5073

You can use regex too. It only needs a slight change from @Bogumil's answer:

julia> function LongestWord2(sen::AbstractString)
           words = matchall(r"\w+", sen)
           words[findmax(length.(words))[2]]
       end
LongestWord2 (generic function with 1 method)

julia> LongestWord2("Hello, how are you? nested, punctuation?")
"punctuation"

This way you get rid of the punctuations and get the raw word back.

To consolidate the comments here's some further explanation:

matchall() takes a regex, in this case r"\w+" which matches word like substrings, so letters, numbers and lowercases and returns an array of strings that match the regex.

length.() is using the combination of the length function and . which broadcasts the operation across all elements of the array. So we're counting the length of each array element (word).

Findmax() returns a tuple of length 2 where the 2 argument gives us the index of the maximum element. I use this to subset the words array and return the longest word.

Upvotes: 4

Bogumił Kamiński
Bogumił Kamiński

Reputation: 69949

I understand that you want to retain punctuation and want to split only on space (" "). If this is the case then you can use findmax. Note that I have changed the order of length(x) and x. In this way you will find the longest word, and among words of equal maximum length you will find the word that is last when using string comparison. Also I put AbstractString in the signature of the function as it will work on any string:

julia> function LongestWord(sen::AbstractString)
           word = [(length(x), x) for x in split(sen, " ")]
           findmax(word)[1][2]
       end
LongestWord (generic function with 1 method)

julia> LongestWord("Hello, how are you? nested, punctuation?")
"punctuation?"

This is the simplest solution but not the fastest (you could loop through the original string by searching consecutive occurrences of space without creating word vector using findnext function).

Other approach (even shorter):

julia> function LongestWord3(sen::AbstractString)
           word = split(sen, " ")
           word[indmax(length.(word))]
       end
LongestWord3 (generic function with 1 method)

julia> LongestWord3("Hello, how are you? nested, punctuation?")
"punctuation?"

Upvotes: 3

Related Questions