tom at zepsu dot com
tom at zepsu dot com

Reputation: 307

Regular Expression, find between words

I have a string that looks a little like

Name: xxx xxx
Company Name: xxx xxx xx
Company Type: xxxx
Tel: xxxx
Email: xxxxxxx
Postcode: xxxxxx

I am trying to pull out the xxx

I am using preg_match_all to do so but the regular expression I need is not something I can grasp :( I have been reading various tutorials around the web and now I understand it all less.

I presume I could do something like

find ^Name:(then any amount of words spaces etc till I get to) Company Name$ then ^Company Name:(then any amount of words spaces etc till I get to) Company Type$

if somebody could just start me off, maybe with a small explanation to help me understand things more, such as the term "matches" how do I define what is a match and what is ignored, as I just want the xxx parts in an array so if I did ^Name:[a-zA-Z0-9]$ would that all be a match or just the bit in [].

Regards.

Edit: Adding the php code I am using.

foreach( $value as $k => &$v ){
    if( $k == "history_date_created" ){
        $v = date( "D jS M Y @ H:i:s", strtotime($v) );
    }

    if( $k == "history_text" ){
        //Name: xxx xxxx Company Name: xxxx xxxx Company Type: xxxx xxxx Tel: xxxx xxxx Email: xxxx xxxx Postcode: xxxx xxxx To Email: xxxx xxxx Subscription: none
        $pattern = "/Name: (.*) Company Name: (.*) Company Type: (.*) Tel: (.*) Email: (.*)/U";
        preg_match_all( $pattern, $v, $matches, PREG_SET_ORDER );
        print_r( $matches );
    }
}

basically I have pulled a row from a database, unfortunately "history_text" is a text field that in my opinion is stored wrong but I can do nothing to change this now so need to pull the different values with regex, the history_text field is created by a form so "Name:" "Company Name:" etc will always be the same, the values of each will not and are user in-putted so could be anything including blank.

Edit My answer:

No Reg Ex needed This is what I did in the end

foreach( $value as $k => &$v ){
    if( $k == "history_date_created" ){
        $v = date( "D jS M Y @ H:i:s", strtotime($v) );
    }

    if( $k == "history_text" ){
        $matches = explode("\n", $v);

        foreach( $matches as $match){
            $boom = explode( ":", $match );
            $value[$boom[0]] = $boom[1];
        }
    }
}

Upvotes: 0

Views: 955

Answers (3)

Adam Fowler
Adam Fowler

Reputation: 1751

There isnt really a good way to separate your data because there is no separator between xxxx and Company Name. if it was company_name instead, then this might not be such a problem.

look into a regex solution, or use the explode function (maybe twice) with ":" and with spaces " ".

Upvotes: 0

Zulan
Zulan

Reputation: 22660

Try this:

preg_match_all("/Name: (.*) Company Name: (.*) Company Type: (.*) Tel: (.*) Email: (.*)/U", $x, $matches, PREG_SET_ORDER);

A few notes about this:

  • . captures any single character - except newlines (by default except newlines)
  • * will extend it to capture multiple characters
  • () will capture those in submatches You can also use other character classes if you want to limit it further.
  • The U modifier (after the //) makes the matching non-greedy. This can be helpful to avoid .* matching parts of your "control text", e.g. when you have multiple matches on a single line.
  • The parameter PREG_SET_ORDER usually makes it more convenient to iterate through the matches array which you can access e.g. by $matches[4][2] for the Company name of the 5th match instead of $matches[2][4] with the default pattern ordering.

EDIT: I assume that you know the actual "description terms" such as "Company Name" otherwise it will be impossible to generally distinguish between "(XXX XXX Company) Name:" and "(XXX XXX) Company Name:"

Also note that you will need only a preg_match to capture a single instance of such a 'line' while preg_match_all will be helpful to capture multiple 'lines'.

Upvotes: 1

tcak
tcak

Reputation: 2212

It looks a little hard and complex to do this by only regex. But you can use regex for : (colon) symbols.

/[^:]*/

This will give you all strings before each colon symbol. Than you can cut last parts of all those strings. eg. If subpos of "Company Name:" !== FALSE, cut last part of that string. That gives you value of Name.

You can use same logic for other parts.

Upvotes: 1

Related Questions