Supez38
Supez38

Reputation: 349

Having trouble understanding regex modifiers in perl to convert to python

I'm having trouble converting these perl regex to python, I've converted simpler ones before. I don't really understand the modifiers /s and /is, I know that /g is global.

I also don't know what the first one exactly does. The second removes a specific li tag with a message in html files.

# First
$data =~ s/\]\((\/uploads\/.*?\.pdf)\)/\]\(ref\/\/\/docs$1\)/g;

# Second
$data =~ s/<li>.*?https:\/\/www\.example\.com.*?<\/li>/$test/is;
# What I think might work in python
data = re.sub('<li>.*?https:\/\/www\.example\.com.*?<\/li>/' + test, data, 1)

Upvotes: 1

Views: 94

Answers (1)

Kamal Nayan
Kamal Nayan

Reputation: 1960

First regex does nothing but appends ref///docs to the beginning.

Explanation:

/\]\((\/uploads\/.*?\.pdf)\)/g
  • \] matches the character "]"
  • \( matches the character "("
    1st Capturing Group (/uploads/.*?.pdf)
  • \/ matches the character "/"
  • uploads matches the characters "uploads" (case sensitive)
  • \/ matches the character "/"
  • .*? matches any character (except for line terminators)
  • *? Quantifier — Matches between zero and unlimited times, as few times as possible, expanding as needed (lazy)
  • \. matches the character ".
  • pdf matches the characters pdf (case sensitive)
  • \) matches the character )

Global pattern flags

  • g modifier: global. All matches (don't return after first match)

Consider the example:

test_str = "](/uploads/something.pdf)"

perl:

my $test_str = "](/uploads/something.pdf)";
$test_str =~ s/\]\((\/uploads\/.*?\.pdf)\)/\]\(ref\/\/\/docs$1\)/g;

python:

test_str = "](/uploads/something.pdf)"
test_str = re.sub(r"\]\((\/uploads\/.*?\.pdf)\)", r"](ref///docs\1)", test_str)

Output of printing test_str after substitution:

](ref///docs/uploads/something.pdf)


I don't know what you really want to do in second regex, but the perl regex says to replace any URL of example.com to any variable $test. Lets play around:

perl:

my $test = "test";
my $data = "<li>list 1 https://www.example.com/site </li>";
$data =~ s/<li>.*?https:\/\/www\.example\.com.*?<\/li>/$test/is;

python:

data = "<li>list 1 https://www.example.com/site </li>";
test = "test"
data = re.sub(r"<li>.*?https:\/\/www\.example\.com.*?<\/li>", test, data, re.S|re.I)

Output of printing data after substitution:

test

Modifiers:

  • i means ignore case (case insensitive search)
  • s means dot will now match any character (including newline)

Upvotes: 1

Related Questions