help
help

Reputation: 120

find multiple things in a string using regex in python

My input string contains various entities like this: conn_type://host:port/schema#login#password

I want to find out all of them using regex in python.

As of now, I am able to find them one by one, like

conn_type=re.search(r'[a-zA-Z]+',test_string)
  if (conn_type):
    print "conn_type:", conn_type.group()
    next_substr_len = conn_type.end()
    host=re.search(r'[^:/]+',test_string[next_substr_len:])

and so on.

Is there a way to do it without if and else? I expect there to be some way, but not able to find it. Please note that every entity regex is different.

Please help, I don't want to write a boring code.

Upvotes: 1

Views: 110

Answers (3)

gregory
gregory

Reputation: 12885

>>>import re
>>>uri = "conn_type://host:port/schema#login#password"
>>>res = re.findall(r'(\w+)://(.*?):([A-z0-9]+)/(\w+)#(\w+)#(\w+)', uri)
>>>res
[('conn_type', 'host', 'port', 'schema', 'login', 'password')]

No need for ifs. Use findall or finditer to search through your collection of connection types. Filter the list of tuples, as need be.

Upvotes: 1

C. Carley
C. Carley

Reputation: 133

Why don't you use re.findall?

Here is an example:

import re;

s = 'conn_type://host:port/schema#login#password asldasldasldasdasdwawwda conn_type://host:port/schema#login#email';

def get_all_matches(s):
    matches = re.findall('[a-zA-Z]+_[a-zA-Z]+:\/+[a-zA-Z]+:+[a-zA-Z]+\/+[a-zA-Z]+#+[a-zA-Z]+#[a-zA-Z]+',s);
    return matches;

print get_all_matches(s);

this will return a list full of matches to your current regex as seen in this example which in this case would be:

['conn_type://host:port/schema#login#password', 'conn_type://host:port/schema#login#email']

If you need help making regex patterns in Python I would recommend using the following website:

A pretty neat online regex tester

Also check the re module's documentation for more on re.findall

Documentation for re.findall

Hope this helps!

Upvotes: 2

Alex Svetkin
Alex Svetkin

Reputation: 1409

If you like it DIY, consider creating a tokenizer. This is very elegant "python way" solution.

Or use a standard lib: https://docs.python.org/3/library/urllib.parse.html but note, that your sample URL is not fully valid: there is no schema 'conn_type' and you have two anchors in the query string, so urlparse wouldn't work as expected. But for real-life URLs I highly recommend this approach.

Upvotes: 1

Related Questions