Reputation: 24336
With Mathematica I always feel that strings are "second class citizens." Compared to a language such as PERL, one must juggle a lot of code to accomplish the same task.
The available functionality is not bad, but the syntax is uncomfortable. While there are a few shorthand forms such as <>
for StringJoin
and ~~
for StringExpression
, most of the string functionality lacks such syntax, and uses clumsy names like: StringReplace
, StringDrop
, StringReverse
, Characters
, CharacterRange
, FromCharacterCode
, and RegularExpression
.
In Mathematica strings are handled like mathematical objects, allowing 5 "a" + "b"
where "a"
and "b"
act as symbols. This is a feature that I would not change, even if that would not break stacks of code. Nevertheless it precludes certain terse string syntax, wherein the expression 5 "a" + "b"
would be rendered "aaaaab"
for example.
Ideas that come to mind, either alone or in combination, are:
Overload existing functions to work on strings, e.g. Take
, Replace
, Reverse
.
Use shortened names for string functions, e.g. StringReplace
>> StrRpl
, Characters
>> Chrs
, RegularExpression
>> "RegEx"
Create new infix syntax for string functions, and possibly new string operations.
Create a new container for strings, e.g. str["string"]
, and then definitions for various functions. (This was suggested by Leonid Shifrin.)
A variable of (4), expand strings (automatically?) to characters, e.g. "string"
>> str["s","t","r","i","n","g"]
so that the characters can be seen by Part
, Take
, etc.
Call another language such as PERL from within Mathematica to handle string processing.
Create new string functions that conglomerate frequently used sequences of operations.
Upvotes: 3
Views: 826
Reputation: 5954
I think the reason these operations have String* names is that they have tiny differences compared to their list counterparts. Specifically compare Cases
to StringCases
.
Now the way to to achieve what you want is to do it like this:
Begin["StringOverload`"];
{Drop, Cases, Take, Reverse};
Unprotect[String];
ToStringHead[Drop] = StringDrop;
ToStringHead[Take] = StringTake;
ToStringHead[Cases] = StringCases;
ToStringHead[Reverse] = StringReverse;
String /:
HoldPattern[(h : Drop | Cases | Take | Reverse)[s_String, rest__]] :=
With[{head = ToStringHead[h]}, head[s, rest]]
RemoveOverloading[] :=
UpValues[String] =
DeleteCases[UpValues[String],
x_ /; ! FreeQ[Unevaluated[x], (Drop | Cases | Take | Reverse)]]
End[];
You get to load stuff with Get
or Need
, and remove the overloading with RemoveOverloading[]
called with the correct context.
In[21]:= Cases["this is a sentence", RegularExpression["\\s\\w\\w\\s"]]
Out[21]= {" is "}
In[22]:= Take["This is dangerous", -9]
Out[22]= "dangerous"
In[23]:= Drop["This is dangerous", -9]
Out[23]= "This is "
I do not think doing this is the right way to go, though. You might consider introducing shorter symbols in some context which would automatically evaluate to String*
symbols
Upvotes: 5