Reputation: 5619
I am trying to get better at regular expressions. I am using regex101.com. I have a regular expression that has two capturing groups. I am then using substitution to incorporate my captured values into another location.
For example I have a list of values:
fat dogs
thin cats
skinny cows
purple salamanders
etc...
and this captures them into two variables:
^([^\s]+)\s+([^\s;]+)?.*
which I then substitute into new sentences using $1 and $2. For example:
$1 animals like $2 are a result of poor genetics.
(obviously this is a silly example)
This works and I get my sentences made but I'm stumped trying to force $1 to have an uppercase first letter. I can see all sorts of examples on MATCHING uppercase or lowercase but not transforming to uppercase.
It seems I need to do some sort of "function" processing. I need to pass $1 to something that will then break it into two pieces... first letter and all the other letters.... transform piece one to uppercase... then smash back together and return the result.
Add to that error checking... and while it is unlikely $1 will have numeric values we should still do a safety check of some sort.
What reading material do I need to look at here?
Upvotes: 3
Views: 2910
Reputation: 107347
I think it can be very simple based on your language of choice. You can firs loop over the list of values and find your match then put the groups within your string by using a capitalize
method for first matched :
for val in my_list:
m = match(^([^\s]+)\s+([^\s;]+)?.*,val)
print "%sanimals like %s are a result of poor genetics."%(m.group(1).capitalize(), m.group(1))
But if you want to dot it all with regex
It's very unlikely to be possible because you need to modify your string and this is generally not a regex a suitable task for regex.
Upvotes: 2
Reputation: 5619
So in the end the answer is that you CAN'T use regex to transform...that's not it's job. Thanks to the input by others I was able to adjust my approach and still accomplish the objective of this self inflicted academic assignment.
First from the OP you'll recall that I had a list and I was capturing two words from that list into regex variables. Well I modified that regex capture to get three capture groups. So for example:
^(\S)(\S+)\s+_(\S)?.*
//would turn fat dogs into
//$1 = f, $2 = at, $3 = dogs
So then using Notepad++ I then replaced with this:
\u$1$2 animals like $3 are a result of poor genetics.
In this way I was able to transform the first letter to uppercase..but as others pointed out this is NOT regex doing the transform but another process. (In this case notepad ++ but could be your c#, perl, etc).
Thank You everyone for helping the newbie.
Upvotes: 1
Reputation: 4346
Put very simply, regex can only replace from what is in your original string. There is no capital F
in fat dogs
so you can't get Fat dogs
as your output.
This is possible in Perl, however, but only because Perl processes the text after the regex substitution has finished, it is not a feature of the regex itself. The following is a short Perl program (sans regex) that performs case transformation if run from the command line:
#!/usr/bin/perl -w
use strict;
print "fat dogs\n"; # fat dogs
print "\ufat dogs\n"; # Fat dogs
print "\Ufat dogs\n"; # FAT DOGS
The same escape sequences work in regexs too:
#!/usr/bin/perl -w
use strict;
my $animal = "fat dogs";
$animal =~ s/(\w+) (\w+)/\u$1 \U$2/;
print $animal; # Fat DOGS
Let me repeat though, it is Perl doing this, not the regex.
Depending on your real world example you may not have to change the case of the letter. If your input is Fat dogs
then you will get the desired result. Otherwise, you will have to process $1
yourself.
In PHP you can use preg_replace_callback()
to process the entire match, including captured groups, before returning the substitution string. Here is a similar PHP program:
<?php
$animal = "fat dogs";
print(preg_replace_callback('/(\w+) (\w+)/', 'my_callback', $animal)); // Fat DOGS
function my_callback($match) {
return ucfirst($match[1]) . ' ' . strtoupper($match[2]);
}
?>
Upvotes: 2
Reputation: 796
A regular expression will only match what is there. What you are doing is essentially:
but what you want to be doing is:
A regular expression doesn't do any 'processing' on the matches, it is just a syntax for finding the matches in the first place.
Most languages have string processing, for instance, if you had you matches in the variables $1
and $2
as above, you would want to do something along the lines of:
$1 = upper(substring($1, 0, 1)) + substring($1, 1)
assuming the upper()
function if you language's strung uppercasing function, and substring()
returns a sub-string (zero indexed).
Upvotes: 2