Reputation: 67
I have a collection of files with a certain structure:
COMPANY_DE-Actual-Contents-of-File-RGB-ENG.pdf
Breakdown:
In the best case my result would be an array with above info with named keys but wouldn't know where to start.
Help would be greatly appreciated!
Thanks, Knal
Sorry to have been so unclear, but a few variables are not always present in the filename: - DE -> fixed options: '_DE', '_BE', or absent - RGB -> Colormode, fixed options: 'RGB', 'CMYK', 'PMS', or absent - ENG -> Language of file, fixed options: 'GER', 'ENG', or absent
Upvotes: 0
Views: 185
Reputation: 67
Inspired by @Armatus i've constructed the following which appears to be fail-safe:
$string = "COMPANY_DE-Actual-Contents+of-File-RGB-ENG.pdf";
$options_location = array('DE','BE');
$options_color = array('RGB','CMYK','PMS');
$options_language = array('ENG','GER');
$parts = preg_split( '/[\.\-\_]/', $string, NULL, PREG_SPLIT_NO_EMPTY );
$data = array();
$data['company'] = array_shift($parts);
$data['filetype'] = array_pop($parts);
if( in_array( $parts[0], $options_location ) ){
$data['location'] = array_shift($parts);
}else{
$data['location'] = NULL;
};
if( in_array( end( $parts), $options_language ) ){
$data['language'] = array_pop($parts);
}else{
$data['language'] = NULL;
};
if( in_array( end( $parts), $options_color ) ){
$data['colormode'] = array_pop($parts);
}else{
$data['colormode'] = NULL;
};
$data['content'] = implode( ' ', $parts );
print_r( $data );
Upvotes: 0
Reputation: 91518
How about:
$files = array(
'COMPANY_DE-Actual-Contents-of-File-RGB-ENG.pdf',
'COMPANY_BE-Actual-Contents-of-File-CMYK-ENG.pdf',
'COMPANY_DE-Actual-Contents-of-File-PMS-GER.doc',
'COMPANY-Actual-Contents-of-File-PMS-GER.doc',
'COMPANY-Actual-Contents-of-File-GER.doc',
'COMPANY-Actual-Contents-of-File.doc',
);
foreach($files as $file) {
preg_match('/^(?<COMPANY>.*?)_?(?<LOCATION>DE|BE)?-(?<CONTENT>.*?)-?(?<COLOR>RGB|CMYK|PMS)?-?(?<LANG>ENG|GER)?\.(?<EXT>[^.]+)$/', $file, $m);
echo "\nfile=$file\n";
echo "COMPANY: ",$m['COMPANY'],"\n";
echo "LOCATION: ",$m['LOCATION'],"\n";
echo "CONTENT: ",$m['CONTENT'],"\n";
echo "COLOR: ",$m['COLOR'],"\n";
echo "LANG: ",$m['LANG'],"\n";
echo "EXT: ",$m['EXT'],"\n";
}
output:
file=COMPANY_DE-Actual-Contents-of-File-RGB-ENG.pdf
COMPANY: COMPANY
LOCATION: DE
CONTENT: Actual-Contents-of-File
COLOR: RGB
LANG: ENG
EXT: pdf
file=COMPANY_BE-Actual-Contents-of-File-CMYK-ENG.pdf
COMPANY: COMPANY
LOCATION: BE
CONTENT: Actual-Contents-of-File
COLOR: CMYK
LANG: ENG
EXT: pdf
file=COMPANY_DE-Actual-Contents-of-File-PMS-GER.doc
COMPANY: COMPANY
LOCATION: DE
CONTENT: Actual-Contents-of-File
COLOR: PMS
LANG: GER
EXT: doc
file=COMPANY-Actual-Contents-of-File-PMS-GER.doc
COMPANY: COMPANY
LOCATION:
CONTENT: Actual-Contents-of-File
COLOR: PMS
LANG: GER
EXT: doc
file=COMPANY-Actual-Contents-of-File-GER.doc
COMPANY: COMPANY
LOCATION:
CONTENT: Actual-Contents-of-File
COLOR:
LANG: GER
EXT: doc
file=COMPANY-Actual-Contents-of-File.doc
COMPANY: COMPANY
LOCATION:
CONTENT: Actual-Contents-of-File
COLOR:
LANG:
EXT: doc
Upvotes: 0
Reputation: 2191
Try not to use regular expressions if possible, or keep them as simple as it gets.
$text = "COMPANY_DE-Actual-Contents-of-File-RGB-ENG.pdf";
$options_location = array('DE','BE');
$options_color = array('RGB','CMYK','PMS');
$options_language = array('ENG','GER');
//Does it have multiple such lines? In that case this:
$lines = explode("\n",$text);
//Then loop over this with a foreach, doing the following for each line:
$parts = preg_split('/[-_\.]/', $line);
$data = array(); //result array
$data['company'] = array_shift($parts); //The first element is always the company
$data['filetype'] = array_pop($parts); //The last bit is always the file type
foreach($parts as $part) { //we'll have to test each of the remaining ones for what it is
if(in_array($part,$options_location))
$data['location'] = $part;
elseif(in_array($part,$options_color))
$data['color'] = $part;
elseif(in_array($part,$options_language))
$data['lang'] = $part;
else
$data['content'] = isset($data['content']) ? $data['content'].' '.$part : $part; //Wasn't any of the others so attach it to the content
}
This is easier to understand as well, instead of having to figure out what exactly a regex is doing.
Note that this assumes that no part of the content can be one of the words which are reserved for location, color or language. If it is possible for these to occur within the contents, you will have to add conditions like isset($data['location'])
to check if there was already another location found and if so add the correct one to the content instead of storing it as the location.
Upvotes: 1
Reputation: 17028
Something like that:
preg_match('#^([^_]+)(_[^-]+)?-([\w-]+)-(\w+)-(\w+)(\.\w+)$#i', 'COMPANY_DE-Actual-Contents-of-File-RGB-ENG.pdf', $m);
preg_match('#^([^_]+)(_[^-]+)?-([\w-]+)-(\w+)[_-]([^_]+)(\.\w+)$#i', 'COMPANY_DE-Actual-Contents-of-File-RGB-ENG.pdf', $m); // for both '_' and '-'
preg_match('#^(\p{Lu}+)(-\p{Lu}+)?-([\w]+)(\-(\p{Lu}+))?(\-(\p{Lu}+))?(\.\w+)$#', 'COMPANY-NL-Actual_Contents_of_File-RGB-ENG.pdf', $m); // if filename parts divider is strictly '-'
var_dump($m);
In last variant as you wewe asking if no country code (-NL) it will be NULL. But with color and langage codes it's not. Try it yourself and you'll figure it out how it works!
Upvotes: 0
Reputation: 5768
Try
$string = "COMPANY_DE-Actual-Contents-of-File-RGB-ENG.pdf";
$array = preg_split('/[-_\.]/', $string);
$len = count($array);
$struct = array($array[0], $array[1], '', $array[$len-3], $array[$len-2], $array[$len-1]);
unset($array[0], $array[1], $array[$len-3], $array[$len-2], $array[$len-1]);
$struct[2] = implode('-', $array);
var_dump($struct);
-
array
0 => string 'COMPANY' (length=7)
1 => string 'DE' (length=2)
2 => string 'Actual-Contents-of-File' (length=23)
3 => string 'RGB' (length=3)
4 => string 'ENG' (length=3)
5 => string 'pdf' (length=3)
Upvotes: 1