annakata
annakata

Reputation: 75844

How important is knowing Regexs?

My personal experience is that regexs solve problems that can't be efficiently solved any other way, and are so frequently required in a world where strings are as important as they are that not having a firm grasp of the subject would be sufficient reason for me to consider not hiring you as a senior programmer (a junior is always allowed the leeway of training).

However.

A number of responses on the recurrent "What's the regex for this?" type-questions suggest that a great deal of coders find them somewhere between unintelligible and opaque.

This is not about whether a simple indexOf or substring is a better solution, that's a technical matter, and sometimes the simple way is correct, sometimes a regex is, and sometimes neither (looking at you html parser questions).

This is about how important it is to understand Regexs and whether the anti-Regex opinion (that trite "...now they have two problems" thing) is merited or FUD.

Should a programmer should be expected to understand Regexs? Is this a required skill?


edit: just in case it isn't clear, I'm not asking whether I need to learn them (I'm a defender of the faith) but whether the anti-camp have are an evolutionary dead end or whether it's an unnecessary niche skill like InstallShield.

Upvotes: 10

Views: 2050

Answers (16)

chiccodoro
chiccodoro

Reputation: 14716

I find it very useful to know regular expressions. They are a very powerful tool, and in my opinion there are problems that you simply can't solve without these.

I would however not take regular expressions as a killing criterion for "hiring you as a senior programmer". They are like the wealth of other tools in the world. You should really known them in a problem domain where you need them, but you cannot presume that someone already knows all of these.

"a junior is always allowed the leeway of training"

If a senior isn't, then I would not hire him!

To the ones that argue how complex and unreadable a regular expression is: If the regexp solution to a problem is complex and unreadable, then probably the problem itself is! Good luck in solving it in an other way...

Upvotes: 1

Noel Walters
Noel Walters

Reputation: 1853

Knowing when to use a regexp and the basics of how they work and what their limitations are is important. But filling your head with a lot of syntax rules that you probably won't need very often is just a pointless academic exercise.

A regexp crib sheet can be written on one sheet of A4 paper or a couple of pages in a textbook - no need to know this stuff by heart, If you use it every day it will stick. If you don't use it very often then the brain cells are probably better used for something else.

Upvotes: 0

Evan
Evan

Reputation: 755

I agree with pretty much everything said here, and just need to include the mandatory quip:

Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems.

(attributed to Jamie Zawinski)

Like most jokes, it contains a kernel of truth.

Upvotes: -1

JSmyth
JSmyth

Reputation: 12163

I find that regex's can be very helpful depending on the type of programming that you do. However I probably write less than one regex a month, and because of this long interval between requiring regex's I forget alot about how they work.

I should probably go through mastering regular expressions or something similar someday.

Upvotes: 0

Jose M Vidal
Jose M Vidal

Reputation: 9152

What does the following do?

"([A-Za-z][A-Za-z0-9+.-]{1,120}:A-Za-z0-9/{1,333}(#([a-zA-Z0-9][a-zA-Z0-9$_.+!*,;/?:@&~=%-]{0,1000}))?)"

How long did it take you to figure out? to debug?

Regexs are awesome for single-use throwaway programs, but long hairy regexps are not the best choice for programs that other people will need to maintain over the years.

Upvotes: 0

sateesh
sateesh

Reputation: 28683

I think knowing a regex is a quite important skill. While the usage of regex in a programming environment/language is question of maintainable code, I find the knowledge of regex to be useful with some commands (say egrep), editors (vim, emacs etc.). Using a regex to do a find and replace in vim is very handy when you have a text file and you want to do some formatting once in a while.

Upvotes: 2

Learning
Learning

Reputation: 8185

The fact that all languages support regexs should mean something !

Upvotes: 2

paxdiablo
paxdiablo

Reputation: 881613

REs let you solve relatively complex problems that would otherwise require you to code up full parsers with backtracking and all that messy sort of stuff. I liken the use of REs to using chainsaws to chop down a tree instead of trying to do it with a piece of celery.

Once you've learned how to use the chainsaw safely, you'll never go back. People who continue to spout anti-RE propaganda will never be as productive as those of us who have learned to love them.

So yes, you should know how to use REs, even if you understand only the basic constructs. They're a tool just like any other.

Upvotes: 26

Srikanth
Srikanth

Reputation: 12110

In the Steve Yegge's article, Five Essential Phone Screen Questions, you should read the section "Area Number Three: Scripting and Regular Expressions".

Steve Yegge has some interesting points. He gives real world problems he has encountered with clients having to parse 50,000 files for a particular pattern of a phone number. The applicants who know regular expressions tear through the problem in a few minutes while those who don't write monster multi-hundred line programs that are very unwieldy. This article convinced me I should learn regular expressions.

Upvotes: 3

Jon Skeet
Jon Skeet

Reputation: 1500835

There are some tasks where regular expressions are the best tool to use.
There are some tasks where regular expressions are pointlessly obscure.
There are some tasks where they're reasonably appropriate, but a different approach may be more readable.

In general, I think of using a regular expression when an actual pattern is involved. If you're just looking for a specific string, I wouldn't generally use a regex. As an example of a grey area, someone once asked on a newsgroup the best way to check whether one string contained any of a number of other strings. The two ways which came up were:

  • Build a regex with alternatives and perform a single match.
  • Test each string in turn with string.Contains.

Personally I think the latter way is much simpler - it doesn't require any thought about escaping the strings you're looking for, or any other knowledge of regular expressions (and their different flavours across different platforms).

As an example of somewhere that regular expressions are quite clearly the wrong choice, someone seriously proposed using a regular expression to test whether or not a string three characters long. Their regular expression didn't even work, despite them claiming that the reason they thought of regular expressions first is because they'd been using them for so long, and that they naturally sort of "thought" in regular expressions.

There are, however, plenty of examples where regular expressions really do make life easier - as I say, when you're actually matching patterns: "I want one letter, then three digits, then another letter" or whatever. I don't find myself using regular expressions very often, but when I do use them, they save a lot of work.

In short, I believe it's good to know regular expressions - but equally to be careful about when to use them. It's easy to end up with write-only code which could be made simpler to understand by rewriting with simple string operations, even if the resulting code is slightly longer.

EDIT: In response to the edit of the question...

I don't think it's a good idea to be evangelical about them - in my experience, that tends to lead to using them where an alternative would be simpler, and that just makes you look bad. On the other hand, if you come across someone writing complicated code to avoid using a regular expression, it's fine to point out that a regex would make the code simpler.

Personally I like to comment my regular expressions in quite a detailed way, splitting them up onto several lines with a comment between each line. That way they're easier to maintain, and it doesn't look like you're just trying to be "hard core" geeky (which can be the impression, even if it's not the actual intended aim).

I think the most important thing is to remember that short != readable. Never claim that using a regex is better because it requires less code - claim that it's better when it's genuinely simpler and easier to understand (or where there's a significant performance benefit, of course).

Upvotes: 18

f3lix
f3lix

Reputation: 29877

When you have to parse something (ranging from simple date strings to programming languages) you should know your tools and regular expressions are one of them.

But you should also know what you can do with regexes and what not. At this point it comes in handy if you know the Chomsky hierarchy hierarchy. Otherwise you end up trying to use regular expressions to parse context-sensitive languages and wonder why you can't get your regex right.

Upvotes: 2

Simon
Simon

Reputation: 80769

I have really mixed feelings. I have used them and know the bones of the syntax and something in me loves their conciseness. However they are not commonly understood and are a highly obfuscated form of code. I too would like to see performance comparisons against similar operations in plain code. There is no question that the exploded code will be more maintainable and more easily and widely understood, which is a serious consideration in any commercial software project.

Even if they turn out to be more performant, the argument for them taken to its logical conclusion would see us all embedding assembler into our code for important loops - perhaps we should. Neat and concise and very fast, but almost un-maintainable.

On balance I think that until the regex syntax becomes mainstream they probably cause more trouble than they solve and should be used only very carefully.

Upvotes: 3

ddaa
ddaa

Reputation: 54464

It is not clear what kind of answer you are expecting.

I can imagine roughly three kinds of answer to this question:

  1. Regexen are essential to the education of professional programmers. They enable the use the powerful unix shell tools, and regex-based search-replace can dramatically cut down on text-munging handiwork that is a part of a programmer's life. Programmers that do not know regexen are just intelectually lazy which is a very bad trait for a programmer.

  2. Regexps are kinda useful depending on the application domain. Surely, knowing how to write regexps is a valuable tool a programmer's chest, but most of the time you can do fine without using them. Also, regexps tend to be very hard to read, so abuse must be strongly discouraged.

  3. Some nutcases like to put regexs everything (I'm looking at you, the perl guy who implemented a regex-based tetris in perl). But really, they are just a bit of computer science trivia whose only practical use is in writing parsers. They are widely taught because they make a good teaching topic on which to evaluate students, and like most such topics it can forgotten the second you step out of the exam room.

You will notice the careful use of the plural forms "regexen" (pro), "regexps" (careful neutral) and "regexs" (con).

Personally, I am of the first kind. Good programmers like to learn new languages, and they hate repetitive handiwork.

Upvotes: 2

Sebastian Dietz
Sebastian Dietz

Reputation: 5706

As a developer you should know the pros and cons of as many tools as possible that could provide pre-made solutions for your problems. Every developer should know how to work with regular expressions and have a feeling when they should be used and when it is besser to use simple string functions to achieve a goal.

Rejecting them outright because they are hard to read is no option in my opinion. A developer who thinks so strips himself of a valuable tool for searching and validating complex string patterns.

Upvotes: 6

MrTelly
MrTelly

Reputation: 14865

Not a brilliant answer but everywhere I've worked the following holds true

0 < Number of people who (fully) understand regex < 1

If I knew how to do it I'd write that previous expression as a regex, but I can't. The best I could come up with on the fly is s/fully/a little/g - that's my limit (and that's probably not a regex).

A more serious answer is that the right regex will solve all kinds of problems, with one(ish) line of code. But you'll have real problems debugging it if it goes wrong. Therefore IMHO a complex regex however 'clean/clever' is a liability, if it takes ten lines of code to replicate it, why's that a problem, is memory/disk space suddenly expensive again?

BTW I'd love to know if regexs are fast compared to code equivalent.

Upvotes: 2

User
User

Reputation:

A developer thought he had one problem and tried to solve it using regex. Now he has 2 problems.

Upvotes: -1

Related Questions