audiophonic
audiophonic

Reputation: 171

Regex - match every possible char and space

I want to extract data from html. The thing is, that i cant extract 2 of strings which are on the top, and on the bottom of my pattern.

I want to extract 23423423423 and 1234523453245 but only, if there is string Allan between:

                                        <h4><a href="/Profile/23423423423.html">@@@@@@</a>  </h4> said12:49:32
            </div>

                                <a href="javascript:void(0)" onclick="replyAnswer(@@@@@@@@@@,'GET','');" class="reportLink">
                    report                    </a>
                        </div>

        <div class="details">
                            <p class="content">


                       Hi there, Allan.



                                </p>

            <div id="AddAnswer1234523453245"></div>

Of course, i can do something like this: Profile\/(\d+).*\s*.*\s*.*\s*.*\s*.*\s*.*\s*.*\s*.*Allan.*\s*.*\s*.*AddAnswer(\d+). But the code is horrible. Is there any solution to make it shorter?

I was thinking about:

Profile\/(\d+)(.\sAllan)*AddAnswer(\d+)

or

Profile\/(\d+)(.*Allan\s*)*AddAnswer(\d+)

but none of wchich works properly. Do you have any ideas?

Upvotes: 1

Views: 70

Answers (3)

Jan
Jan

Reputation: 43199

Better use a parser instead. If you must use regular expressions for whatever reason, you might get along with a tempered greedy solution:

Profile/(\d+)            # Profile followed by digits
(?:(?!Allan)[\S\s])+     # any character except when there's Allan ahead
Allan                    # Allan literally
(?:(?!AddAnswer)[\S\s])+ # same construct as above
AddAnswer(\d+)           # AddAnswer, followed by digits

See a demo on regex101.com

Upvotes: 0

Strikeskids
Strikeskids

Reputation: 4052

You can construct a character group to match any character including newlines by using [\S\s]. All space and non-space characters is all characters.

Then, your attempts were reasonably close

/Profile\/(\d+)[\S\s]*Allan[\S\s]*AddAnswer(\d+)/

This looks for the profile, the number that comes after it, any characters before Allan, any characters before AddAnswer, and the number that comes after it. If you have single-line mode available (/s) then you can use dots instead.

/Profile\/(\d+).*Allan.*AddAnswer(\d+)/s

demo

Upvotes: 2

chifung7
chifung7

Reputation: 2621

You can use m to specify . to match newlines.

/Profile\/(\d+).+AddAnswer(\d+)/m

Upvotes: 0

Related Questions