Reputation: 3
I have problem with unnecessary strings in result. I want pull only https from files. My code is:
import sys
import os
import hashlib
import re
if len(sys.argv) < 2 :
sys.exit('Aby uzyc wpisz: python %s filename' % sys.argv[0])
if not os.path.exists(sys.argv[1]):
sys.exit('BLAD!: Plik "%s" nie znaleziony!' % sys.argv[1])
with open(sys.argv[1], 'rb') as f:
plik = f.read()
print("MD5: %s" % hashlib.md5(plik).hexdigest())
print("SHA1: %s" % hashlib.sha1(plik).hexdigest())
print("SHA256: %s" % hashlib.sha256(plik).hexdigest())
print("Podejrzane linki: \n")
pliki = open(sys.argv[1], 'r')
for line in pliki:
if re.search("(H|h)ttps:(.*)",line):
print(line)
elif re.search("(H|h)ttp:(.*)",line):
print(line)
pliki.close()
In result:
MD5: f16a93fd2d6f2a9f90af9f61a19d28bd
SHA1: 0a9b89624696757e188412da268afb2bf5b600aa
SHA256: 3b365deb0e272146f00f9d723a9fd4dbeacddc10123aec8237a37c10c19fe6df
Podejrzane linki:
GrizliPolSurls = "http://xxx.xxx.xxx.xxx"
FilnMoviehttpsd.Open "GET", "https://xxx.xxx.xxx.xxx",False
I want only strings in ""
and starts from http
or https
e.g http://xxx.xxx.xxx.xxx
Desired result:
MD5: f16a93fd2d6f2a9f90af9f61a19d28bd
SHA1: 0a9b89624696757e188412da268afb2bf5b600aa
SHA256: 3b365deb0e272146f00f9d723a9fd4dbeacddc10123aec8237a37c10c19fe6df
Podejrzane linki:
http://xxx.xxx.xxx.xxx
https://xxx.xxx.xxx.xxx
Upvotes: 0
Views: 164
Reputation: 91
re.search() returns a Match Object
You have to fetch the information from the result:
line = "my text line contains a http://192.168.1.1 magic url"
result = re.search("[Hh]ttps?://\d+\.\d+\.\d+\.\d+", line)
print(result.group()) # will print http://192.168.1.1
Upvotes: 0
Reputation: 37367
You need this pattern: (?<=")http[^"]+
.
(?<=")
- positive lookbehind, to determine if "
precceds current position.
http
- match http
literally.
[^"]+
- match everything until "
, this is negated class technique to avoid quantifiers :)
Upvotes: 1
Reputation: 20414
You can use re.findall
with the following regex (explained on regex101):
"([Hh]ttps?.*?)"
so:
import re
s = '''MD5MD5:: f16a93fd2d6f2a9f90af9f61a19d28bd
SHA1 f16a93fd2 : 0a9b89624696757e188412da268afb2bf5b600aa
SHA256: 3b365deb0e272146f00f9d723a9fd4dbeacddc10123aec8237a37c10c19fe6df
Podejrzane linki:
GrizliPolSurls = "http://xxx.xxx.xxx.xxx"
FilnMoviehttpsd.Open "GET", "https://xxx.xxx.xxx.xxx",False'''
urls = re.findall('"([Hh]ttps?.*?)"', s)
#['http://xxx.xxx.xxx.xxx', 'https://xxx.xxx.xxx.xxx']
Upvotes: 2