Reputation: 13
I am trying extract some formatted info from files.
Sample data
2011/09/20 00:57 367,044,608 S1E04 - Cancer Man.avi
2012/03/12 03:01 366,991,496 Family Guy - S09E01 - And Then There Were Fewer.avi
2012/03/25 00:27 53,560,510 Avatar- The Legend of Korra S01E01.avi
What i would like to extract is the Date, File size and name of the file, remembering that the file can start with basically anything. and file size changes all the time.
What I have currently.
$dateModifyed = substr($file, 0, 10);
$fileSize = preg_match('[0-9]*/[0-9]*/[0-9]*/s[0-9]*:[0-9]*/s*', $file, $match)
$FileName =
The full code that I am working on:
function recursivePrint($folder, $subFolders, $Jsoncounter) {
$f = fopen("file.json", "a");
echo '{ "id" : "' . $GLOBALS['Jsoncounter'] . '", parent" : "' . "#" . '", Text" : "' . $folder . '" },' . "\n";
$PrintString = '{ "id" : "' . $GLOBALS['Jsoncounter'] . '", parent" : "' . "#" . '", Text" : "' . $folder . '" },' . "\n";
fwrite($f, $PrintString);
$foldercount = $GLOBALS['Jsoncounter'];
$GLOBALS['Jsoncounter']++;
foreach($subFolders->files as $file) {
preg_match('/^(\d{4}/\d{2}/\d{2}\s+\d{2}:\d{2})\s+([\d,]+)\s+(.*)$/', $file, $match);
$dateModified = $match[1];
$fileSize = str_replace(',', '', $match[2]);
$fileName = $match[3];
echo $dateModified . $fileSize . $fileName;
echo '{ "id" : "' . $GLOBALS['Jsoncounter'] . '", parent" : "' . $foldercount . '", Text" : "' . $file . '" },';
$PrintString ='{ "id" : "' . $GLOBALS['Jsoncounter'] . '", parent" : "' . $foldercount . '", Text" : "' . $file . '" },';
fwrite($f, $PrintString);
$GLOBALS['Jsoncounter']++;
}
foreach($subFolders->folders as $folder => $subSubFolders) {
recursivePrint($folder, $subSubFolders, $Jsoncounter);
}
fclose($f);
}
Upvotes: 1
Views: 326
Reputation: 47894
While preg_match()
is certainly a viable technique and preg_match_all()
can parse the whole file in one-go, you should also consider the seldom enjoyed fscanf()
function which is specifically designed to parse lines of predictably formatted text directly from a file handle. One difference versus preg_match()
and preg_match_all()
is that it can return the desired values without any unneeded elements (like the full string match).
$result = [];
if ($handle = fopen($file, 'r')) {
while (fscanf($handle, "%s *%s %s %[^\n]", $date, $size, $title)) {
$result[] = [
'date' => $date,
'size' => (int) str_replace(',', '', $size),
'title' => $title
];
}
fclose($handle);
echo json_encode($result); // print fully-formed, valid JSON string
It is important to remind everyone to avoid the temptation to manually create json strings -- it exposes your script to potentially generating invalid JSON which can be a headache to repair.
Notice how you have:
echo '{ "id" : "' . $GLOBALS['Jsoncounter'] . '", parent" : "' . $foldercount . '", Text" : "' . $file . '" },';
// whoops ---------------------------------------^---------------------------------^
// your manually written json is missing leading double quotes on two keys
Upvotes: 0
Reputation: 780994
You need to use capture groups to get the parts of the string that are matched by different parts of the regular expression. Capture groups use parentheses around portions of the regexp.
preg_match('#^(\d{4}/\d{2}/\d{2}\s+\d{2}:\d{2})\s+([\d,]+)\s+(.*)$#', $string, $match);
$dateModified = $match[1];
$fileSize = str_replace(',', '', $match[2]);
$fileName = $match[3];
Other problems in your regexp:
/s
instead of \s
for whitespace characters.There's a tutorial on regular expressions at www.regular-expressions.info.
Upvotes: 1
Reputation: 336158
There are several problems in your regex:
preg_match('[0-9]*/[0-9]*/[0-9]*/s[0-9]*:[0-9]*/s*', $file, $match)
^--missing delimiter ^ ^-- asterisk instead of plus
|--literal s instead of \s
and of course you haven't used anchors or capturing groups, and the regex isn't finished yet.
Try the following:
preg_match_all(
'%^ # Start of line
([0-9]+/[0-9]+/[0-9]+) # Date (group 1)
\s+ # Whitespace
([0-9]+:[0-9]+) # Time (group 2)
\s+ # Whitespace
([0-9,]+) # File size (group 3)
\s+ # Whitespace
(.*) # Rest of the line%mx',
$file, $result, PREG_SET_ORDER);
for ($matchi = 0; $matchi < count($result); $matchi++) {
for ($backrefi = 0; $backrefi < count($result[$matchi]); $backrefi++) {
# Matched text = $result[$matchi][$backrefi];
so for example $result[0][1]
will contain 2011/09/20
, and $result[2][4]
will contain Avatar- The Legend of Korra S01E01.avi
etc.
Upvotes: 1