Reputation: 19
In python, i want to extract a particular sub string till the input word provided.
Consider the following string:-
"Name: abc and Age:24"
I want to extract the string "Name : abc and"
änd "Age:24"
seperately.
I am currently using the following pattern:
re.search(r'%S+\s*:[\S\s]+',pattern).
but o/p is the whole string.
Upvotes: 1
Views: 3495
Reputation: 174796
You need to use re.findall
.
>>> s = "Name: abc and Age:24"
>>> re.findall(r'\S+\s*:.*?(?=\s*\S+\s*:|$)', s)
['Name: abc and', 'Age:24']
>>> re.findall(r'[^\s:]+\s*:.*?(?=\s*[^\s:]+\s*:|$)', s)
['Name: abc and', 'Age:24']
[^\s:]+
matches any character but not of :
or space one or more times. So this matches the key part.\s*:
matches zero or more spaces and the colon symbol..*?
matches zero or more non-greedily until(?=\s*[^\s:]+\s*:|$)
the key part or end of the line. (?=...)
called positive lookahead which asserts whether a match is possible or not. It won't match any single character.OR
You could use re.split
.
>>> re.split(r'\s+(?=[^\s:]+\s*:)', s)
['Name: abc and', 'Age:24']
Upvotes: 0
Reputation: 107347
You can use re.findall
:
>>> import re
>>> s="Name: abc and Age:24"
>>> re.findall(r'[A-Za-z]+:[a-z\s]+|[A-Za-z]+:\d+',s)
['Name: abc and ', 'Age:24']
In preceding pattern as in your string the keys(Age
and Name
) starts with uppercase letters you ca use [A-Za-z]+
for match them.that will match any combinations of uppercase and lowercase letters with len 1 or more, but for the rest of string after :
you can just use lower case letters, and also the same for second part.but for string after :
in second part you just match a digit with length 1 or more!
If its possible that you had string in second part after :
you can use \w
instead of \d
:
>>> re.findall(r'[A-Za-z]+:[a-z\s]+|[A-Za-z]+:\w+',s)
['Name: abc def ghi ', 'Location:Earth']
Upvotes: 1
Reputation: 59
You could use this regex:
\w+[:]\w+|\w+[:](\s)\w+|\w+(\s)[:]\w+
Here's a breakdown:
\w+[:]\w+
\w means get a word, [:] means get a colon character, the + symbol says get a word which is before the colon character. The rest of it works the other way around :)
The | symbol is just an OR operator which I use to check if spaces follow or come before the colon.
It will get the words that are before and after a colon. It works when there is a space before or after the colon as well.
Upvotes: 0