Reputation: 1105
As of now this is my code. I got stuck on how to look up/compare the data of payment.csv to transactions.csv.
<?php
//this shows the name, email, deposit date and amount from payment.csv
$data = file("data/payment.csv");
foreach ($data as $deposit){
$depositarray = explode(",", $deposit);
$depositlist = $depositarray;
$name = $depositlist[0];
$email = $depositlist[1];
$depositdate = $depositlist[9];
$depositamount = $depositlist[10];
//echo $depositamount;
}
//this shows the payment date and amount from transaction.csv
$databank = file("datas/transactions.csv");
foreach ($databank as $payment){
$paymentarray = explode(",", $payment);
$paymentlist = $paymentarray;
$paymentdate = $paymentlist[0];
$paymentamount = $paymentlist[5];
//echo $paymentamount;
}
?>
Example:
From payment.csv
, every ($depositdate
&& $depositamount
) will compare to transactions.csv
($paymentdate
&& $paymentamount
).
Every row with matched will save to an array
then display on table later. If not matched save to an array and display later.
Would anyone help me or give me an idea to accomplish this? Just to display all rows with matched data.
This should be the output when a transaction have multiple matched
Upvotes: 1
Views: 2736
Reputation: 4513
The requirements: Find and record, transaction records that have a match with a corresponding master record.
A list of all the transaction records that match a master record must be kept.
The comparison is based on 'date' and amount. (updated to easily allow any values from the current row, as you pass an array of 'column names' to use.)
The issue is that this can get quite expensive if the arrays are not sorted by the keys you want to compare.
One approach is to generate a 'key' that is unique for each 'data key fields' but is easy to generate and is a fixed size to make comparisons easy.
Then use these 'keys' to generate a lookup
array of 'generated key' to the original record.
This saves having to provide sorted data on the fields you want to compare. However, the resulting lookup array must fit in memory.
I decided to use an MD5 hash for the concatenated data keys. The chances of collisions are not important in this application. And MD5 is excellent at generating unique hashes. It is also quick.
// ---------------------------------------------------------------------------------
class HashMatch {
/*
* Generate a MD5 hash for each master and tranasaction using some
* of the data fields as the string to be hashed.
*/
/**
* Master source records
*
* @var array
*/
private $master = null;
/**
* Transaction Source records must have the same field names as the master
* of the indexes that are used to generate the MD5 hash
*
* @var array
*/
private $transaction = null;
/**
* The generated MD5 hash is the key in the Master source records.
*
* Each record has a list of other Master Record Ids that also have the same hash
*
* @var array
*/
private $hashMaster = array();
/**
* The generated MD5 hash is the key in the Transaction source records.
*
* Each record has a list of other Transaction Record Ids that also have the same hash
*
* @var array
*/
private $hashTransaction = array();
/**
* Specify which index names to use from the supplied data record arrays
* to generate the MD5 hash with.
*
* @var array
*/
private $keyNames = array();
/**
* Generate a MD5 hash for each master and transaction using some
* of the data fields as the string to be hashed.
*
* You can pass an array of field names to used to generate the key.
*
* This allows any records to be used in this class as you just provide
* the li9st of names to generate the MD5 hash
*
*
* @param array $master
* @param array $transaction
* @param array $keyNames
*
* @return void
*/
public function __construct(array $master,
array $transaction,
array $keyNames = array('when', 'amount'))
{
$this->master = $master;
$this->transaction = $transaction;
$this->keyNames = $keyNames;
}
/**
* Generate all the Hashes and store all the matching details
*
* @return bool
*/
public function generateMatches()
{
$this->processMaster();
$this->processTransaction();
return !empty($this->hashMaster) && !empty($this->hashTransaction);
}
/**
* Generate a list of MD5 hashes as a key
*
* Keep a list of other master records with the same hash
*
* @return void
*/
public function processMaster()
{
foreach ($this->master as $recordId => $data) {
$hash = $this->generateHash($data);
if (empty($this->hashMaster[$hash])) { // add it...
$this->hashMaster[$hash]['masterId'] = $recordId;
$this->hashMaster[$hash]['matchIds'] = array($recordId);
}
else { // is a duplicate so add to the match list
$this->hashMaster[$hash]['matchIds'][] = $recordId;
}
}
}
/**
* Generate a list of MD5 hashes as a key for the Transaction source
*
* Match the hashes against the master list and record if there is a match
*
* @return void
*/
public function processTransaction()
{
foreach ($this->transaction as $recordId => $data) {
$hash = $this->generateHash($data);
if (empty($this->hashMaster[$hash])) { // skip this record
continue;
}
// record a match with the master
if (empty($this->hashTransaction[$hash])) { // new record
$this->hashTransaction[$hash]['masterId'] = $this->hashMaster[$hash]['masterId'];
$this->hashTransaction[$hash]['matchIds'] = array();
}
// add to the list of matches
$this->hashTransaction[$hash]['matchIds'][] = $recordId;
}
}
/**
* Return Master MD5 list
*
* The keys are unique, however there are extra values:
*
* 'masterId' ==> The first record in the array with this key
*
* 'matchIds' ==> A *complete* list of all the master records that have this key.
* Yes, it includes itself, this allows you to just use this list
* when reporting.
*
* @return array
*/
public function getHashMasterList()
{
return $this->hashMaster;
}
/**
* Return Master MD5 list with more that one matching master
*
* i.e. duplicate master records with the same hash
*
* @return array
*/
public function getHashMatchedMasterList()
{
$out = array();
foreach ($this->hashMaster as $key => $item) {
if (count($item['matchIds']) >= 2) {
$out[$key] = $item;
}
}
return $out;
}
/**
* All the transactions that matched a master record
*
* @return array
*/
public function getHashTransactionList()
{
return $this->hashTransaction;
}
/**
* given a master hash then return the details as:
*
* i.e. this converts a hash key back into source records for processing.
*
* 1) A list of matching master records
*
* e.g. $out['master'][] ...
*
*
* 2) A list of matching transaction records
*
* e.g. $out['transaction'][] ...
*
* @param string $hash
*
* @return array
*/
public function getMatchedRecords($hash)
{
$out = array('key' => $hash,
'master' => array(),
'transaction' => array(),
);
if (!empty($this->hashMaster[$hash])) { // just in case is invalid hash
foreach ($this->hashMaster[$hash]['matchIds'] as $recordId) {
$out['master'][] = $this->master[$recordId];
}
}
if (!empty($this->hashTransaction[$hash])) {
foreach ($this->hashTransaction[$hash]['matchIds'] as $recordId) {
$out['transaction'][] = $this->transaction[$recordId];
}
}
return $out;
}
/**
* Generate an MD5 hash from the required fields in the data record
* The columns to use will have been passed in the constructor
* and found in '$keyNames'
*
* It is so you don't have to edit anything to use this class
*
* @param array $row
*
* @return string
*/
public function generateHash($row)
{
$text = '';
foreach ($this->keyNames as $name) {
$text .= $row[$name];
}
return Md5($text);
}
}
later....
// !!!! You can pass the names of the fields to be used to generate the key
$match = new HashMatch($master,
$transaction,
array('whenDone', 'amount'));
$match->generateMatches();
// print output...
echo '<pre>Hash Master Records with multiple Matching Masters ... ', PHP_EOL;
print_r($match->getHashMatchedMasterList());
echo '</pre>';
Matching Master to Transaction...
Array
(
[key] => 296099e19b77aad413600a1e2f2cb3cd
[master] => Array
(
[0] => Array
(
[name] => John Matched
[whenDone] => 2016-04-01
[amount] => 12345
[email] => [email protected]
)
[1] => Array
(
[name] => Jane Matched
[whenDone] => 2016-04-01
[amount] => 12345
[email] => [email protected]
)
)
[transaction] => Array
(
[0] => Array
(
[name] => John Doe
[whenDone] => 2016-04-01
[amount] => 12345
[email] => [email protected]
)
[1] => Array
(
[name] => micky mean
[whenDone] => 2016-04-01
[amount] => 12345
[email] => [email protected]
)
)
)
$master[] = array('name' => 'First last', 'whenDone' => '2016-03-03', 'amount' => 12000, 'email' => '[email protected]', );
$master[] = array('name' => 'John Matched', 'whenDone' => '2016-04-01', 'amount' => 12345, 'email' => '[email protected]');
$master[] = array('name' => 'Jane Unmatched', 'whenDone' => '2016-05-02', 'amount' => 12345, 'email' => '[email protected]');
$master[] = array('name' => 'Jane Matched', 'whenDone' => '2016-04-01', 'amount' => 12345, 'email' => '[email protected]');
$transaction[] = array('name' => 'Mary Lamb', 'whenDone' => '2016-03-04', 'amount' => 12000, 'email' => '[email protected]');
$transaction[] = array('name' => 'John Doe', 'whenDone' => '2016-04-01', 'amount' => 12345, 'email' => '[email protected]');
$transaction[] = array('name' => 'micky mean', 'whenDone' => '2016-04-01', 'amount' => 12345, 'email' => '[email protected]');
Upvotes: 3
Reputation: 21
First of all, if it's valid csv-files, you should parse the data via fgetcsv() instead of relying upon that there's no comma inside of a string.
You could create an index-array for deposit-date while reading csv1, and simply lookup this index while reading csv2. If there's matching dates, compare the amounts and proceed further.
something like this:
// read csv1, store in array and create index
$data = array();
$fh = fopen($csv1, 'r');
while($row = fgetcsv($fh)) {
$data[] = $row;
$val = $row[$interestd_in];
$key = count($data) - 1;
$idx[$val][] = $key; // array containing all indices
}
fclose($fh);
// read csv2, lookup in index and process further
$fh = fopen($csv2, 'r');
while($row2 = fgetcsv($fh)) {
$val = $row2[$interest2];
if(!empty($idx[$val])) {
foreach($idx[$val] as $key) {
$row1 = $data[$key];
/*
do your further comparisons of these 2 data-lines
and output, if matches found
*/
}
}
}
fclose($fh);
Upvotes: 0
Reputation: 3264
According to @Ryan Vincent's comment:
<?php
$masterData = array();
//this shows the name, email, deposit date and amount from payment.csv
$data = file("data/payment.csv");
foreach ($data as $deposit){
$depositarray = explode(",", $deposit);
$key = md5($depositlist[9] . $depositlist[10]); //date + amount
$depositlist = $depositarray;
$masterData[$key]['payment'] = array(
'name' => $depositlist[0],
'email' => $depositlist[1],
'depositdate' => $depositlist[9],
'depositamount' => $depositlist[10]
);
}
//this shows the payment date and amount from transaction.csv
$databank = file("datas/transactions.csv");
foreach ($databank as $payment){
$paymentarray = explode(",", $payment);
$key = md5($paymentlist[0] . $paymentlist[5]); //date + amount
$masterData[$key]['transaction'] = array(
'paymentdate' => $paymentlist[0],
'paymentamount' => $paymentlist[5]
);
}
?>
Now you have an array $masterData
which have all data with same date and amount under same key.
But I still do not think that this list is good for anything because you do not know which payment belongs to which transaction because date and amount could be same.
However, if you do now a check like:
<?php
foreach($masterData as $data) {
echo explode(',', $data[0]);
if(count($data) == 2) {
echo explode(',', $data[1]) . ', matched';
}
echo '<br/>';
}
?>
you should have each line with your data and at the end of line a matched
when there was a transaction.
But like I said - since people could have a transaction with same amount on same date, you do not know which transaction belongs to which person.
Upvotes: 1