Kuchen
Kuchen

Reputation: 23

Python: How to get full match with RegEx

I'm trying to filter out a link from some java script. The java script part isin't relevant anymore because I transfromed it into a string (text).

Here is the script part:

<script>                
                					
					setTimeout("location.href = 'https://airdownload.adobe.com/air/win/download/30.0/AdobeAIRInstaller.exe';", 2000);
                
    
                $(function() {
                    $("#whats_new_panels").bxSlider({
                        controls: false,
                        auto: true,
                        pause: 15000
                    });
                });
                setTimeout(function(){
                    $("#download_messaging").hide();
                    $("#next_button").show();
                }, 10000);
            </script>

Here is what I do:

import re

def get_link_from_text(text):
   text = text.replace('\n', '')
   text = text.replace('\t', '')
   text = re.sub(' +', ' ', text)

   search_for = re.compile("href[ ]*=[ ]*'[^;]*")
   debug = re.search(search_for, text)

   return debug

What I want is the href link and I kind of get it, but for some reason only like this

<_sre.SRE_Match object; span=(30, 112), match="href = 'https://airdownload.adobe.com/air/win/dow>

and not like I want it to be

<_sre.SRE_Match object; span=(30, 112), match="href = 'https://airdownload.adobe.com/air/win/download/30.0/AdobeAIRInstaller.exe'">

So my question is how to get the full link and not only a part of it.

Might the problem be that re.search isin't returning longer strings? Because I tried altering the RegEx, I even tried matching the link 1 by 1, but it still returns only the part I called out earlier.

Upvotes: 0

Views: 3079

Answers (1)

ttreis
ttreis

Reputation: 131

I've modified it slightly, but for me it returns the complete string you desire now.

import re

text = """
<script>                

setTimeout("location.href = 'https://airdownload.adobe.com/air/win/download/30.0/AdobeAIRInstaller.exe';", 2000);


    $(function() {
        $("#whats_new_panels").bxSlider({
            controls: false,
            auto: true,
            pause: 15000
        });
    });

    setTimeout(function(){
        $("#download_messaging").hide();
         $("#next_button").show();
    }, 10000);
</script>
"""

def get_link_from_text(text):
   text = text.replace('\n', '')
   text = text.replace('\t', '')
   text = re.sub(' +', ' ', text)

   search_for = re.compile("href[ ]*=[ ]*'[^;]*")
   debug = search_for.findall(text)

   print(debug)

get_link_from_text(text)

Output:

["href = 'https://airdownload.adobe.com/air/win/download/30.0/AdobeAIRInstaller.exe'"]

Upvotes: 1

Related Questions