user774528
user774528

Reputation: 187

Group rows of a 2d arrays by multiple columns and only if date values are consecutive

I have multiple associative arrays that I want to merge if all values except startdate are the same. If two arrays are indeed the same, I want to merge them and create a new element enddate so that startdate and enddate show the date range. All dates within the range must be represented in the original arrays, i.e. if a date is missing, the dates on either side of it must not be merged.

Array('color'=>'red','size'=>'large','shape'=>'circle','startdate'=>'2011-08-17')
Array('color'=>'red','size'=>'large','shape'=>'circle','startdate'=>'2011-08-18')
Array('color'=>'red','size'=>'large','shape'=>'square','startdate'=>'2011-08-20')

should become:

Array('color'=>'red','size'=>'large','shape'=>'circle','startdate'=>'2011-08-17','enddate'=>'2011-08-18')
Array('color'=>'red','size'=>'large','shape'=>'square','startdate'=>'2011-08-20')

So far I have tried looping through each array and creating a multidimensional array:

foreach($arrays as $id => $array){
    $mergearray[$array['red']][$array['large']][$array['circle']] = $id;
}

in order to check whether another array has the same values. I'm trying to use those arrays to reconstruct arrays in the original structure.

Upvotes: 5

Views: 286

Answers (6)

mickmackusa
mickmackusa

Reputation: 48071

Ensure that your input data is at least sorted by startdate values.

Iterate over the rows, and use a string built from the 3 identifying columns to determine the grouping.

If a group is encountered for the first time or the current row does not immediately follow the last date of the dedicated group, then start a new group.

If a group is encountered more than once and has a consecutive date, then write that new date as the new enddate.

Using references prevents needing to keep track of where previous values have been pushed into the result array. Demo

usort($array, fn($a, $b) => $a['startdate'] <=> $b['startdate']);
$result = [];
foreach ($array as $row) {
    $compositeKey = implode(
        '_',
        [$row['color'], $row['size'], $row['shape']]
    );
    if (
        !isset($ref[$compositeKey])
        || ($row['startdate'] != date('Y-m-d', strtotime(($ref[$compositeKey]['enddate'] ?? $ref[$compositeKey]['startdate']) . ' + 1 day')))
    ) {
        if (isset($ref[$compositeKey])) {
           unset($ref[$compositeKey]);
        }
        $ref[$compositeKey] = $row;
        $result[] =& $ref[$compositeKey];
    } else {
        $ref[$compositeKey]['enddate'] = $row['startdate'];
    }
}
var_export($result);

Upvotes: 0

Tim
Tim

Reputation: 2403

function group_and_sort($data)
{
    $out = array();
    while($data) {

        // Shift off the first element
        $buffer = array_shift($data);
        $end_date = $buffer['startdate'];

        // Try to group successive elements...
        while($data) {

            // Case 1: Does the next element differ by more than just date?
            if(count(array_diff_assoc($buffer, $data[0])) > 1) {
                break;
            }

            // Case 2: Does the next element have an unexpected date?
            $expected_date = date('Y-m-d', strtotime('+1 day', strtotime($end_date)));
            if($data[0]['startdate'] != $expected_date) {
                break;
            }

            // Otherwise, push end_date forward and throw away the element
            $end_date = $data[0]['startdate'];
            array_shift($data);
        }

        // If the dates differ, record the range.
        if($buffer['startdate'] != $end_date) {
            $buffer['enddate'] = $end_date;
        }

        $out[] = $buffer;
    }

    return $out;
}

Assumes the elements are already sorted by date. If they're not, you could use:

function sort_startdate($a, $b)
{
    return strcmp($a['startdate'], $b['startdate']);
}
usort($data, 'sort_startdate');

prior to passing it to group_and_sort($data).

Upvotes: 0

Ben XO
Ben XO

Reputation: 1229

Given that you have an array of arrays already, what you're actually trying to do is DELETE consecutive entries (other than the first entry for each time span).

Your algorithm would be:

$expected_start_time= 0; // initial val
foreach($all_entries as $k => &$v) {
    $start_time = strtotime($v['startdate']);
    if($start_time != $expected_start_time) {
        $range_start =& $v; // this is a range beginning. Put end date in here
    } else {
        $range_start['enddate'] = $v['startdate'];
        unset($all_entries[$k]);
    }
    $expected_date = strtotime('+1 day', $start_time);
}

This is basically a more minimal, and in-place version of Dave Child's answer.

Upvotes: 0

Morg.
Morg.

Reputation: 701

Avoid the day/ date functions it's a waste of cpu time for no added benefit (answer from Dave). Avoid the massive array function use (answer from adlawson), same waste of time, just ordering arrays and processing the smart way will be much faster.

But mostly, what are you trying to do, why are those arrays like that and ... is there an SQL behind all that ?

Because if there is, there's much simpler solutions than attempting to merge badly-formed arrays (with all due respect, the transformation you seek implies that something went wrong at some point.. a start_time becoming an end_time denotes an attempt at keeping a full item history, which has no place in PHP itself imho).

I'll give you the right PHP code if it really is what you need but I have quite a doubt it's the correct way to go forward for your application.

Upvotes: 1

adlawson
adlawson

Reputation: 6431

I can't see the actual implementation of such a merge, but this should work. It may look long winded, but it takes care of a few things as it goes.

/**
 * @example array_merge_allbutstartdate($array1, $array2, $array3, $array4, $arr...)
 * @param array $array1
 * @param array $array2, $arr...
 * @return array
 */
function array_merge_allbutstartdate(array $array1, array $array2)
{
    $out  = array();
    $date = array();
    $startKey = 'startdate';
    $endKey   = 'enddate';

    foreach (func_get_args() as $item) {
        if (!is_array($item)) {
            trigger_error('All arguments should be an array.', E_USER_ERROR);
        }

        $temp = null;
        if (isset($item[$startKey])) {
            $temp = $item[$startKey];
            unset($item[$startKey]);
        }

        if (!in_array($item, $out)) {
            $i = count($out);
            $out[] = $item;
        } else {
            $i = array_search($item, $out);
        }

        if (null !== $temp) {
            $date[$i][] = $temp;
        }
    }

    foreach ($date as $j => $row) {
        array_map('strtotime', $row);
        $start = array_search(min($row), $row);
        $end = array_search(max($row), $row);

        // Add start date
        $out[$j][$startKey] = $date[$j][$start];

        // Only add end date if it is not equal to start date
        if ($date[$j][$start] !== $date[$j][$end]) {
            $out[$j][$endKey] = $date[$j][$end];
        }
   }

    return $out;
}

Upvotes: 0

Dave Child
Dave Child

Reputation: 7901

Something like this should do the trick:

$newArray = array();
$newLine = false;
$prev_day = false;
foreach ($array as $line) {
    $this_day = strtotime($line['startdate']);
    if (($newLine) && ($prev_day == strtotime('-1 day', $this_day))) {
        $newLine['enddate'] = $line['startdate'];
        $newArray[] = $newLine;
        $newLine = false;
    } elseif (!$newLine) {
        $newLine = $line;
    }
    if ($newLine) {
        $newLine['enddate'] = $line['startdate'];
    }
    $prev_day = $this_day;
}
$newArray[] = $newLine;

Upvotes: 0

Related Questions