xnoob
xnoob

Reputation: 23

How to combine string | REGEX

import re

def tst():
  text = '''
  <script>
  '''
  if proxi := re.findall(r"(?:<td\s[^>]*?><font\sclass\=spy14>(.*?)<script.*?\"\+(.*?)\)<\/script)", text):
    for proxy, port in proxi:
      yield f"{proxy}:{''.join(port)}"
    
    if dtt := re.findall(r"<td colspan=1><font class\=spy1><font class\=spy14>(.*?)</font> (\d+[:]\d+) <font class\=spy5>([(]\d+ \w+ \w+[)])", text):
      for date, time, taken in dtt:
        yield f"{date} {' '.join([time, taken])}"
   
    return None
  return None

for proxy in tst():
  print(proxy)

output that i get

51.155.10.0:8000
178.128.96.80:7497
98.162.96.41:4145
27-oct-2022 11:05 (49 mins ago)
27-oct-2022 11:04 (50 mins ago)
27-oct-2022 11:03 (51 mins ago)

so i use this regex below to capture group from output

(\w+[.]\w+[.]\w+[.]\w+[:]\w+)|(\w+.*)

i want the result like this, how to combine it from output?

157.245.247.84:7497 - 27-oct-2022 11:05 (49 mins ago)
184.190.137.213:8111 - 27-oct-2022 11:04 (50 mins ago)
202.149.89.67:7999 - 27-oct-2022 11:03 (51 mins ago)

Upvotes: 1

Views: 112

Answers (4)

trincot
trincot

Reputation: 351403

Assuming that the code at the top of your (edited) question has regular expressions that work perfectly, and they run the same number of matches, you could use zip:

import re

def tst():
    text = '''
    <script>
    '''
    proxi = re.findall(r"(?:<td\s[^>]*?><font\sclass\=spy14>(.*?)<script.*?\"\+(.*?)\)<\/script)", text)
    dtt = re.findall(r"<td colspan=1><font class\=spy1><font class\=spy14>(.*?)</font> (\d+[:]\d+) <font class\=spy5>([(]\d+ \w+ \w+[)])", text)
    if proxi and dtt:
        for (proxy, port), (date, time, taken) in zip(proxi, dtt):
            yield f"{proxy}:{''.join(port)} {date} {' '.join([time, taken])}"
   
for proxy in tst():
    print(proxy)

Upvotes: 1

Ramesh
Ramesh

Reputation: 585

using regex

import re

text = '''
157.245.247.84:7497
184.190.137.213:8111
202.149.89.67:7999
27-oct-2022 11:05 (49 mins ago)
27-oct-2022 11:04 (50 mins ago)
27-oct-2022 11:03 (51 mins ago)
'''
ip_regex = r"(?:\d{1,3}\.){3}\d{1,3}\:\d{4}"
time_regex = r'\d{2}\-\w+\-\d{4}\s\d{2}\:\d{2}\s\(.+\)'

ip_list = re.findall(ip_regex, text)
time_list = re.findall(time_regex, text)

for i in range(len(ip_list)):
    print(f'{ip_list[i]} - {time_list[i]}')


>>> 157.245.247.84:7497 - 27-oct-2022 11:05 (49 mins ago)
>>> 184.190.137.213:8111 - 27-oct-2022 11:04 (50 mins ago)
>>> 202.149.89.67:7999 - 27-oct-2022 11:03 (51 mins ago)

Upvotes: 1

Adon Bilivit
Adon Bilivit

Reputation: 27424

Providing the text is guaranteed to contain N lines of IP addresses followed by N lines of "timestamps" then you could do this:

text = '''157.245.247.84:7497
184.190.137.213:8111
202.149.89.67:7999
27-oct-2022 11:05 (49 mins ago)
27-oct-2022 11:04 (50 mins ago)
27-oct-2022 11:03 (51 mins ago)'''

lines = text.splitlines()

for ip, t in zip(lines, lines[len(lines)//2:]):
    print(f'{ip} - {t}')

Output:

157.245.247.84:7497 - 27-oct-2022 11:05 (49 mins ago)
184.190.137.213:8111 - 27-oct-2022 11:04 (50 mins ago)
202.149.89.67:7999 - 27-oct-2022 11:03 (51 mins ago)

Upvotes: 0

Tim Biegeleisen
Tim Biegeleisen

Reputation: 522817

This approach reads all lines into a list, then iterates the IP lines and date lines in tandem to generate the output.

text = '''157.245.247.84:7497
184.190.137.213:8111
202.149.89.67:7999
27-oct-2022 11:05 (49 mins ago)
27-oct-2022 11:04 (50 mins ago)
27-oct-2022 11:03 (51 mins ago)'''
lines = text.split('\n')
output = []
for i in range(0, len(lines) / 2):
    val = lines[i] + ' - ' + lines[i + len(lines)/2]
    output.append(val)

print('\n'.join(output))

This prints:

157.245.247.84:7497 - 27-oct-2022 11:05 (49 mins ago)
184.190.137.213:8111 - 27-oct-2022 11:04 (50 mins ago)
202.149.89.67:7999 - 27-oct-2022 11:03 (51 mins ago)

Note that this answer assumes each IP line would always have exactly one matching date line. It also assumes that the lines are ordered, and that all IP lines come before the date lines.

Upvotes: 1

Related Questions