puelo
puelo

Reputation: 6037

Counting similarity of arrays inside array

I have a problem where I am pretty unsure how to solve this.

Given are arrays in such a format:

$array01 = array(
    0 => array("hallo", "welt", "du", "ich"),
    1 => array("mag", "dich"),
    2 => array("nicht", "haha", "huhu")
);

$array02 = array(
    0 => array("haha", "welt", "dich"),
    1 => array("hallo", "mag", "nicht"),
    2 => array("du", "ich", "huhu")
);

Now I want to calculate some kind of similarity value of these arrays. These arrays are the result of clustering terms according to their meaning.

What I want to know is how similar these terms are clustered by two different users ($array01 = user1, $array02 = user2). 0,1,2 are those clusters (they don't have to be the same length)

EDIT: So i try to describe a little bit further: Every array is a result of a user clustering the terms (hallo, welt, du, ich...) according to their meaning. So every sub-array is one cluster defined by the user. Now the problem is that the user is not restricted in where he places a term or the whole cluster, so i cannot just compare $array01[0] with $array02[0]. I guess i need to compare the sub-arrays with the most terms in common. Every user HAS to cluster all terms though.

So for example:

$array01[0] and $array02[2]. They have 2 terms in common: "du" and "ich" -> +1

The other terms have no clear clustering, so i would guess this example would yield 1, because the clusterings are not very similar.

Upvotes: 0

Views: 98

Answers (3)

mpyw
mpyw

Reputation: 5754

How about this?


get_similar_items

Code:

<?php

$array01 = array(
    0 => array("hallo", "welt", "du", "ich"),
    1 => array("mag", "dich"),
    2 => array("nicht", "haha", "huhu")
);

$array02 = array(
    0 => array("haha", "welt", "dich"),
    1 => array("hallo", "mag", "nicht"),
    2 => array("du", "ich", "huhu")
);

function get_similar_items() {
    $arrs = func_get_args();
    foreach ($arrs as &$arr) {
        while (list($k, $v) = each($arr)) {
            if (is_array($v)) {
                array_splice($arr,$k,1,$v);
                next($arr);
            }
        }
    }
    return call_user_func_array('array_intersect',$arrs);
}

print_r(get_similar_items($array01,$array02));

Result:

Array
(
    [0] => hallo
    [1] => welt
    [2] => du
    [3] => ich
    [4] => mag
    [5] => dich
    [6] => nicht
    [7] => haha
    [8] => huhu
)

get_similar_items_count

Code:

<?php

$array01 = array(
    0 => array("hallo", "welt", "du", "ich"),
    1 => array("mag", "dich"),
    2 => array("nicht", "haha", "huhu")
);

$array02 = array(
    0 => array("haha", "welt", "dich"),
    1 => array("hallo", "mag", "nicht"),
    2 => array("du", "ich", "huhu")
);

$array03 = array(
    0 => array("haha", "haha", "dich"),
    1 => array("dich", "mag", "mag"),
    2 => array("du", "ich", "haha")
);

function get_similar_items_count() {
    $arrs = func_get_args();
    foreach ($arrs as &$arr) {
        while (list($k, $v) = each($arr)) {
            if (is_array($v)) {
                array_splice($arr,$k,1,$v);
                next($arr);
            }
        }
    }
    unset($arr);
    $counts = array();
    foreach ($arrs as $arr) {
        foreach (array_count_values($arr) as $k => $v) {
            if ($v) {
                if (!isset($counts[$k])) {
                    $counts[$k]  = $v;
                } else {
                    $counts[$k] += $v;
                }
            }                
        }
    }
    return $counts;
}

print_r(get_similar_items_count($array01,$array02,$array03));

Result:

Array
(
    [hallo] => 2
    [welt] => 2
    [du] => 3
    [ich] => 3
    [mag] => 4
    [dich] => 4
    [nicht] => 2
    [haha] => 5
    [huhu] => 2
)

Upvotes: 2

CrayonViolent
CrayonViolent

Reputation: 32517

Based on your comment, my understanding is you want to compare all the values in the first array to the second array. IOW all words within all subarrays of array1 should be compared to all words of all subarrays of array2

$array01 = array(
    0 => array("hallo", "welt", "du", "ich"),
    1 => array("mag", "dich"),
    2 => array("nicht", "haha", "huhu")
);

$array02 = array(
    0 => array("haha", "welt", "dich"),
    1 => array("hallo", "mag", "nicht"),
    2 => array("du", "ich", "huhu")
);

$t_array01 = array();
foreach($array01 as $arr) {
  $t_array01 = array_merge($t_array01,$arr);
}
$t_array02 = array();
foreach($array02 as $arr) {
  $t_array02 = array_merge($t_array02,$arr);
}

$common = array_intersect($t_array01,$t_array02);

$common is the array of all words that are in both arrays. In your example, both arrays contain all of the same words, so it has all of the words. If you just want a count of how many, you can do count($common)

Upvotes: 1

Wrikken
Wrikken

Reputation: 70500

count(array_intersect($array01[0],$array02[0]));

Possibly foreach() through bot arrays & sum it.

Upvotes: 1

Related Questions