Reputation: 33
I need to extract the housenumber with all the different constellations in austria:
| Street name | housenumber | stairs | floor | door |
| --------------------------------------- | ----------- | ------ | ----- | ---- |
| Lilienstr. 12a | 12a | | | |
| Leibnizstraße 36/28/2 | 36 | 28 | | 2 |
| Prager Straße 14/3/1/4 | 14 | 3 | 1 | 4 |
| Guentherstr. 43 B | 43 B | | | |
| Eberhard-Leibnitz Str. 1/7 | 1 | | | 7 |
| Schießstätte 7/7 | 7 | | | 7 |
I've already found this question: Regex to extract (german) street number.
This works if no stair/floor/door is entered. Can you help?
^[ \-0-9a-zA-ZäöüÄÖÜß.]+?\s+(\d+(\s?[a-zA-Z])?)\s*(?:$|\(|[A-Z]{2})
Upvotes: 3
Views: 364
Reputation: 163287
The credits for the core of the pattern using the optional capturing groups with a positive lookahead go to @JvdV as he suggested with his pattern in the comments.
As an alternative, you can get the group numbers / names in the order of the specified schedule in the question, by capturing the digits of the stairs / floor / door and asserting how many parts of a forward slash followed by a digit are directly to the right.
If the assertion fails, the pattern will try the next part as all the groups are optional.
^(?<address>(?<streetname>\h*\S.*?)\h*(?<housenumber>\d+\h*[A-Za-z]?))(?:/(?<stairs>\d+)(?=(?:/\d+){1,2}))?(?:/(?<floor>\d)(?=(?:/\d+)))?(?:/(?<door>\d+))?$
In parts
^
Start of string(?<address>
Group address
(?<streetname>
Group streetname
\h*\S.*?
Match 0+ horizontal whitespace chars, a non whitepace char to make sure address is not empty and match any char as least as possible (non greedy))
Close group streetname\h*
Match 0+ horizontal whitespace chars for the trailing spaces after the streetname(?<housenumber>
Group housenumber
\d+\h*[A-Za-z]?
Match 1+ digits, 0+ horizontal whitespace chars and optional char a-zA-Z)
Close group housenumber)
Close group address(?:
Non capture group
/(?<stairs>\d+)
Group stairs
, match 1+ digits(?=(?:/\d+){1,2})
Positive lookahead, assert what is at the right is 1 or 2 times /
followed by 1 or 2 digits)?
Close group and make it optional(?:
Non capture group
/(?<floor>\d+)
Group floor, match 1+ digits(?=(?:/\d))
Positive lookahead, assert what is at the right is /
followed by a digit)?
Close group and make it optional(?:
Non capture group
/(?<door>\d+)
Group door, match 1+ digits)?
Close group and make it optional$
End of stringExample code
$re = '~^(?<address>(?<streetname>\h*\S.*?)\h*(?<housenumber>\d+\h*[A-Za-z]?))(?:/(?<stairs>\d+)(?=(?:/\d+){1,2}))?(?:/(?<floor>\d)(?=(?:/\d+)))?(?:/(?<door>\d+))?$~m';
$strings = [
"Lilienstr. 12a",
"Leibnizstraße 36/28/2",
"Prager Straße 14/3/1/4",
"Guentherstr. 43 B",
"Eberhard-Leibnitz Str. 1/7",
"Schießstätte 7/7"
];
foreach ($strings as $string) {
preg_match_all($re, $string, $matches, PREG_SET_ORDER);
$address = array_filter($matches[0], "is_string", ARRAY_FILTER_USE_KEY); // from php 5.6
print_r($address);
}
Output
Array
(
[address] => Lilienstr. 12a
[streetname] => Lilienstr.
[housenumber] => 12a
)
Array
(
[address] => Leibnizstraße 36
[streetname] => Leibnizstraße
[housenumber] => 36
[stairs] => 28
[floor] =>
[door] => 2
)
Array
(
[address] => Prager Straße 14
[streetname] => Prager Straße
[housenumber] => 14
[stairs] => 3
[floor] => 1
[door] => 4
)
Array
(
[address] => Guentherstr. 43 B
[streetname] => Guentherstr.
[housenumber] => 43 B
)
Array
(
[address] => Eberhard-Leibnitz Str. 1
[streetname] => Eberhard-Leibnitz Str.
[housenumber] => 1
[stairs] =>
[floor] =>
[door] => 7
)
Array
(
[address] => Schießstätte 7
[streetname] => Schießstätte
[housenumber] => 7
[stairs] =>
[floor] =>
[door] => 7
)
Upvotes: 2
Reputation: 3684
Not knowing Austrian address formats it's hard for me to say if this is correct, however, please see the regex below.
^(.*)\s+(\d+(?:\s*[a-zA-Z])?)(?:\/(\d+))?(?:\/(\d+))?(?:\/(\d+))?\s*(?:$|\(|[A-Z]{2})
This expression will always match all 4 number groups (1/2/3/4) so you will need to do some additional processing to determin if an address has a housenumber and stairs and floor and door, compared to if an address only has a housenumber and door.
For example:
<?php
$pattern = '^(.*)\s+(\d+(?:\s*[a-zA-Z])?)(?:\/(\d+))?(?:\/(\d+))?(?:\/(\d+))?\s*(?:$|\(|[A-Z]{2})$';
$addresses = [
'Lilienstr. 12a',
'Leibnizstraße 36/28/2',
'Prager Straße 14/3/1/4',
'Guentherstr. 43 B',
'Eberhard-Leibnitz Str. 1/7',
'Schießstätte 7/7'
];
$results = [];
foreach ( $addresses as $address ) {
// 0. Full match
// 1. Streetname
// 2. Housenumber
// 3. Stairs
// 4. Floor
// 5. Door
preg_match( '/' . $pattern . '/', $address, $matches );
// Remove full match from
array_shift( $matches );
// Set up default values
$streetname = array_shift( $matches );
$housenumber = null;
$stairs = null;
$floor = null;
$door = null;
// Count total values given
$total = count( array_filter( array_map( 'trim', $matches ) ) );
switch ( $total ) {
// Has all 4 parts
case 4:
$housenumber = $matches[ 0 ];
$stairs = $matches[ 1 ];
$floor = $matches[ 2 ];
$door = $matches[ 3 ];
break;
// Only has 3 parts
case 3:
$housenumber = $matches[ 0 ];
$stairs = $matches[ 1 ];
$door = $matches[ 2 ];
break;
// Only has 2 parts
case 2:
$housenumber = $matches[ 0 ];
$door = $matches[ 1 ];
break;
// Has 1 part
default:
$housenumber = $matches[ 0 ];
break;
}
// Add to results array
$results[] = [
'address' => $address,
'streetname' => $streetname,
'housenumber' => $housenumber,
'stairs' => $stairs,
'floor' => $floor,
'door' => $door
];
}
print_r( $results );
Output
Array
(
[0] => Array
(
[address] => Lilienstr. 12a
[streetname] => Lilienstr.
[housenumber] => 12a
[stairs] =>
[floor] =>
[door] =>
)
[1] => Array
(
[address] => Leibnizstraße 36/28/2
[streetname] => Leibnizstraße
[housenumber] => 36
[stairs] => 28
[floor] =>
[door] => 2
)
[2] => Array
(
[address] => Prager Straße 14/3/1/4
[streetname] => Prager Straße
[housenumber] => 14
[stairs] => 3
[floor] => 1
[door] => 4
)
[3] => Array
(
[address] => Guentherstr. 43 B
[streetname] => Guentherstr.
[housenumber] => 43 B
[stairs] =>
[floor] =>
[door] =>
)
[4] => Array
(
[address] => Eberhard-Leibnitz Str. 1/7
[streetname] => Eberhard-Leibnitz Str.
[housenumber] => 1
[stairs] =>
[floor] =>
[door] => 7
)
[5] => Array
(
[address] => Schießstätte 7/7
[streetname] => Schießstätte
[housenumber] => 7
[stairs] =>
[floor] =>
[door] => 7
)
)
See here: http://sandbox.onlinephpfunctions.com/code/3952b2f3cab251e7137bcd9d55e42d8c8bcdd723
Upvotes: 1