Mr.Wizard
Mr.Wizard

Reputation: 24336

Make string manipulation more convenient in Mathematica

With Mathematica I always feel that strings are "second class citizens." Compared to a language such as PERL, one must juggle a lot of code to accomplish the same task.

The available functionality is not bad, but the syntax is uncomfortable. While there are a few shorthand forms such as <> for StringJoin and ~~ for StringExpression, most of the string functionality lacks such syntax, and uses clumsy names like: StringReplace, StringDrop, StringReverse, Characters, CharacterRange, FromCharacterCode, and RegularExpression.

In Mathematica strings are handled like mathematical objects, allowing 5 "a" + "b" where "a" and "b" act as symbols. This is a feature that I would not change, even if that would not break stacks of code. Nevertheless it precludes certain terse string syntax, wherein the expression 5 "a" + "b" would be rendered "aaaaab" for example.


What is the best way to make string manipulation more convenient in Mathematica?

Ideas that come to mind, either alone or in combination, are:

  1. Overload existing functions to work on strings, e.g. Take, Replace, Reverse.

    • This was the original topic of my question to which Sasha replied. It was seen as inadvisable.

  2. Use shortened names for string functions, e.g. StringReplace >> StrRpl, Characters >> Chrs, RegularExpression >> "RegEx"

  3. Create new infix syntax for string functions, and possibly new string operations.

  4. Create a new container for strings, e.g. str["string"], and then definitions for various functions. (This was suggested by Leonid Shifrin.)

  5. A variable of (4), expand strings (automatically?) to characters, e.g. "string" >> str["s","t","r","i","n","g"] so that the characters can be seen by Part, Take, etc.

  6. Call another language such as PERL from within Mathematica to handle string processing.

  7. Create new string functions that conglomerate frequently used sequences of operations.

Upvotes: 3

Views: 826

Answers (1)

Sasha
Sasha

Reputation: 5954

I think the reason these operations have String* names is that they have tiny differences compared to their list counterparts. Specifically compare Cases to StringCases.

Now the way to to achieve what you want is to do it like this:

Begin["StringOverload`"];
{Drop, Cases, Take, Reverse};
Unprotect[String];
ToStringHead[Drop] = StringDrop;
ToStringHead[Take] = StringTake;
ToStringHead[Cases] = StringCases;
ToStringHead[Reverse] = StringReverse;
String /: 
 HoldPattern[(h : Drop | Cases | Take | Reverse)[s_String, rest__]] :=
  With[{head = ToStringHead[h]}, head[s, rest]]
RemoveOverloading[] := 
 UpValues[String] = 
  DeleteCases[UpValues[String], 
   x_ /; ! FreeQ[Unevaluated[x], (Drop | Cases | Take | Reverse)]]
End[];

You get to load stuff with Get or Need, and remove the overloading with RemoveOverloading[] called with the correct context.

In[21]:= Cases["this is a sentence", RegularExpression["\\s\\w\\w\\s"]]

Out[21]= {" is "}

In[22]:= Take["This is dangerous", -9]

Out[22]= "dangerous"

In[23]:= Drop["This is dangerous", -9]

Out[23]= "This is "

I do not think doing this is the right way to go, though. You might consider introducing shorter symbols in some context which would automatically evaluate to String* symbols

Upvotes: 5

Related Questions