Wiliam
Wiliam

Reputation: 3754

Match a PHP class with a Regular Expression

I wanna catch Php classes from a file:

class a {
   function test() { }
}

class b extends a {
   function test() { }
}

and the result matches must be

class a {
   function test() { }
}

and

class b extends a {
   function test() { }
}

Upvotes: 1

Views: 1407

Answers (6)

Ensai Tankado
Ensai Tankado

Reputation: 343

here the official way:

^[a-zA-Z_\x80-\xff][a-zA-Z0-9_\x80-\xff]*$

from https://www.php.net/manual/en/language.oop5.basic.php

so it would be:

class[\s]{1,}[a-zA-Z_\x80-\xff][a-zA-Z0-9_\x80-\xff]*

Upvotes: 0

Wiliam
Wiliam

Reputation: 3754

The next Regex worked for now:

^(?:(public|protected|private|abstract)\s+)?class\s+([a-z0-9_]+)(?:\s+extends\s+([a-z0-9_]+))?(?:\s+implements\s+([a-z0-9_]+))?.+?{.+?^}

Needs:

case insensitive | ^$ match at line breaks | dot matches new lines

This only works if "class" and the last "}" don't have indent.

Upvotes: 0

Gumbo
Gumbo

Reputation: 655309

Use token_get_all to get the array of language tokens of the PHP code. Then iterate it and look for a token with the value of T_CLASS that represents the class key word (this does not take abstract classes or the visibility into account). The next T_STRING token is the name of the class. Then look for the next plain token that’s value is {, increase a counter for the block depth and decrease it with every plain } token until visited the same amount of closing braces as opening braces (your counter is then 0). Then you have walked the whole class declaration.

Upvotes: 0

quantumSoup
quantumSoup

Reputation: 28132

Here's what you should use:

http://www.php.net/manual/en/function.token-get-all.php

Upvotes: 0

cHao
cHao

Reputation: 86525

A single regex won't do this. PHP is a more complex language than regex (insert something about context-free and regular grammars here). It'll drive you crazy to even try, unless you alter your source code to make it easier for the regex to match.

Upvotes: 1

user187291
user187291

Reputation: 53940

regexps are poor at parsing programming languages' grammars. Consider tokenizer functions instead. e.g. http://php.net/manual/en/function.token-get-all.php see also this http://framework.zend.com/apidoc/core/Zend_Reflection/Zend_Reflection_File.html

Upvotes: 5

Related Questions