assassinezio
assassinezio

Reputation: 99

Search for Pattern in list : python Regex

After the Data Analysis & getting the Required Result I'm appending that result to a List
Now I Need to Retrieve Or Separate the Result (Search For Pattern & Obtain It)

Code:

data = []
data.append('\n'.join([' -> '.join(e) for e in paths]))

List Contais This data:
CH_Trans -> St_1 -> WDL
TRANSFER_Trn -> St_1
Access_Ltd -> MPL_Limited
IPIPI -> TLC_Pvt_Ltd
234er -> Three_Star_Services -> Asian_Pharmas -> PPP_Channel
Sonata_Ltd -> Three_Star_Services
Arc_Estates -> Russian_Hosp
A -> B -> C -> D -> E -> F
G -> H
ZN_INTBNKOUT_SET -> -2008_1 -> X
ZZ_1_ -> AA_2 -> AA_3 -> ZZ_1_
XYZ- -> ABC -> XYZ-
SSS -> BBB -> SSS
Rock_8CC -> Russ -> By_sus -> Rock_8CC

Note : Display or Retrieve Pattern Which has more than two symbol of type[->]
( Txt -> txt -> txt )

I'm Trying to get it Done by Regex

for i in data:
    regex = ("\w+\s->\s\w+\s->\s\w+")             
    match = re.findall(regex, i,re.MULTILINE)
    print(match)

Regex Expression I Tried But Unable to get Requried Result
#\w+\s->\s\w+\s->\s\w+
#\w+\s[-][>]\s\w+\s[-][>]\s\w+
#\w+\s[-][>]\s\w+\s[-][>]\s\w+\s[-][>]\s\w+

Result I Got
['CH_Trans-> St_1-> WDL', '234er -> Three_Star_Services -> Asian_Pharmas',
 'A -> B -> C', 'D -> E -> F', 'ZZ_1_ -> AA_2 -> AA_3', 
'SSS -> BBB -> SSS', 'Rock_8CC -> Russ -> By_sus']

Requried Result What I want to Obtain is

----Pattern I------
CH_Trans -> St_1 -> WDL
234er -> Three_Star_Services -> Asian_Pharmas -> PPP_Channel
A -> B -> C -> D -> E -> F
ZN_INTBNKOUT_SET -> -2008_1 -> X


# Pattern II Consists of Patterns which are same i.e[ Fist_ele & Last_Ele Is Same]
----Pattern II------
ZZ_1_ -> AA_2 -> AA_3 -> ZZ_1_
XYZ- -> ABC -> XYZ-
SSS -> BBB -> SSS
Rock_8CC -> Russ -> By_sus -> Rock_8CC

Upvotes: 0

Views: 248

Answers (1)

tshiono
tshiono

Reputation: 22087

Would you please try the following as a starting point:

regex = r'^\S+(?:\s->\s\S+){2,}$'
for i in data:
    m = re.match(regex, i)
    if (m):
        print(m.group())

Results (Pattern I + Pattern II):

CH_Trans -> St_1 -> WDL
234er -> Three_Star_Services -> Asian_Pharmas -> PPP_Channel
A -> B -> C -> D -> E -> F
ZN_INTBNKOUT_SET -> -2008_1 -> X
ZZ_1_ -> AA_2 -> AA_3 -> ZZ_1_
XYZ- -> ABC -> XYZ-
SSS -> BBB -> SSS
Rock_8CC -> Russ -> By_sus -> Rock_8CC

Explanation of the regex ^\S+(?:\s->\s\S+){2,}$:

^\S+       start with non-blank string
(?: ... )  grouping
\s->\s\S+  a blank followed by "->" followed by a blank and non-blank string
{2,}       repeats the previous pattern (or group) two or more times
$          end of the string

As of pattern II please say:

regex = r'^(\S+)(?:\s->\s\S+){1,}\s->\s\1$'
for i in data:
    m = re.match(regex, i)
    if (m):
        print(m.group())

Results:

ZZ_1_ -> AA_2 -> AA_3 -> ZZ_1_
XYZ- -> ABC -> XYZ-
SSS -> BBB -> SSS
Rock_8CC -> Russ -> By_sus -> Rock_8CC

Explanation of regex r'^(\S+)(?:\s->\s\S+){1,}\s->\s\1$':

- ^(\S+)     captures the 1st element and assigns \1 to it
- (?: ... )  grouping
- \s->\s\S+  a blank followed by "->" followed by a blank and non-blank string
- {1,}       repeats the previous pattern (or group) one or more times
- \s->\s\1   a blank followed by "->" followed by a blank and the 1st element \1
- $          end of the string

In order to obtain the result of pattern I, we may need to subtract the list of pattern II from the 1st results.
If we could say:

regex = r'^(\S+)(?:\s->\s\S+){2,}(?<!\1)$'

it will exclude the string whose last element differs from the 1st element then we could obtain the result of pattern I directry but the regex causes the error saying "group references in lookbehind assertions" so far.

Upvotes: 1

Related Questions