royskatt
royskatt

Reputation: 1210

Python - Regex, multiple match

I have a file which conatains amongst others SQL-CREATE-TABLE-commands. I want to write all SQL-CREATE-TABLE-commands into a list (not implemented yet), each command in a seperate list entry.

My problem is, that the regular expression does only return the first match, but there should be more.

Source file:

abcd
something
CREATE TABLE schema.test1(attribute1 DECIMAL(28, 7)  NULL , 
ATTRIBUTE2 DECIMAL(28, 7)  KEY  NOT NULL , 
ATTRIBUTE3 DECIMAL(28, 7)  NOT NULL , 
SET("db_alias_name" = 'TEST')
;

efgh
something else
CREATE TABLE schema.test2(attribute1 DECIMAL(28, 7)  NULL , 
ATTRIBUTE2 DECIMAL(28, 7)  KEY  NOT NULL , 
ATTRIBUTE3 DECIMAL(28, 7)  NOT NULL , 
SET("db_alias_name" = 'TEST')
;

something else
CREATE TABLE schema.test3(attribute1 DECIMAL(28, 7)  NULL , 
ATTRIBUTE2 DECIMAL(28, 7)  KEY  NOT NULL , 
ATTRIBUTE3 DECIMAL(28, 7)  NOT NULL , 
SET("db_alias_name" = 'TEST')
;
something else
12346
higkl

My script only returns the first match:

CREATE TABLE schema.test1(attribute1 DECIMAL(28, 7)  NULL , 
ATTRIBUTE2 DECIMAL(28, 7)  KEY  NOT NULL , 
ATTRIBUTE3 DECIMAL(28, 7)  NOT NULL , 
SET("db_alias_name" = 'TEST')

Script:

# -*- coding: utf-8 -*-
import os
import re

create_table_parts = []

atlfile = 'example.txt'
data = ''

def read_file(afile):
    with open(afile) as atl:
        text = atl.read()
        return text

data = read_file(atlfile)
data_utf8 = unicode(data, "utf-8")

round1 = re.search(r"(CREATE\sTABLE).+?(?=;)", data_utf8, re.MULTILINE|re.DOTALL)
print round1.group()

Could you maybe tell me, what's wrong here?

Upvotes: 1

Views: 124

Answers (3)

royskatt
royskatt

Reputation: 1210

Thanks to Mark's hint, below now a working example solution:

# -*- coding: utf-8 -*-
import os
import re

create_table_parts = []
atlfile = 'example.txt'
data = ''

def read_file(afile):
    with open(afile) as atl:
        text = atl.read()
        return text

data = read_file(atlfile)
data_utf8 = unicode(data, "utf-8")


def round1_get_CT(text):
    match_list = []
    someIter = re.finditer(r"(CREATE\sTABLE).+?(?=;)", text, re.MULTILINE|re.DOTALL)
    for mObj in someIter:
        #print mObj.group()
        match_list.append(mObj.group())
    return match_list

create_table_parts = round1_get_CT(data_utf8)

print "\n".join(create_table_parts)

Upvotes: 0

Mark
Mark

Reputation: 108512

You'd be better off using finditer because it returns a match object like search:

someIter = re.finditer(r"(CREATE\sTABLE).+?(?=;)", data_utf8, re.MULTILINE|re.DOTALL)
for mObj in someIter:
    # process mObj

Upvotes: 2

Oscar Olsson
Oscar Olsson

Reputation: 21

You could use findall instead, see https://docs.python.org/2/library/re.html#re.findall

Upvotes: 1

Related Questions