Jonathan
Jonathan

Reputation: 11321

How can I use this regex in Haskell?

I'm trying to make a simple Haskell program that will take any line that looks like someFilenameHere0035.xml and returns 0035. My sample input file, input.txt, would look like this:

someFilenameHere0035.xml
anotherFilenameHere4465.xml

And running: cat input.txt | runhaskell getID.hs should return:

0035
4465

I'm having so much difficulty figuring this out. Here's what I have so far:

import Text.Regex.PCRE

getID :: String -> [String]
getID str = str =~ "([0-9]+)\\.xml" :: [String]

main :: IO ()
main = interact $ unlines . getID

But I get an error message I don't understand at all:

• No instance for (RegexContext Regex String [String])
 arising from a use of ‘=~’
• In the expression: str =~ "([0-9]+)\\.xml" :: [String]
   In an equation for ‘getID’:
   getID str = str =~ "([0-9]+)\\.xml" :: [String] (haskell-stack-ghc)

I feel like I'm really close, but I don't know where to go from here. What am I doing wrong?

Upvotes: 0

Views: 107

Answers (1)

James Burton
James Burton

Reputation: 756

First off you only want the number part so we can get rid of the \\.xml.

The regex-pcre library defines an instance for RegexContext Regex String String but not RegexContext Regex String [String] hence the error.

So if we change the type signature to String -> String then that error is taken care of.

unlines expects [String] so to test what we had at this point I wrote a quick function that wraps its argument in a list (there's probably a nicer way to do that but that's not the point of the question):

toList :: a -> [a]
toList a = [a]

Running your command with main = interact $ unlines . toList . getID output 0035, so we're almost there.

getID is passed a String of the file contents, these are conveniently separated by the \n character. So we can use splitOn "\n" from the Data.List.Split library to get our list of .xml files.

Then we simply need to map getID over that list (toList is no longer needed).

This gives us:

import Text.Regex.PCRE
import Data.List.Split

getID :: String -> String
getID str = str =~ "([0-9]+)"

main :: IO ()
main = interact $ unlines . map getID . splitOn "\n"

This gives me the desired output when I run your command.

Hopefully this helps :)

Upvotes: 1

Related Questions