Reputation: 67
My question can be understood with an example given below :
Suppose This is the text file, which contains these lines :
hello this is my word file and this is line number 1
hello this is second line and this is some text
hello this is third line and again some text
jhasg djgha sdgasjhgdjasgh jdkh
sdhgfkjg sdjhgf sjkdghf sdhf
s hdg fjhsgd fjhgsdj gfj ksdgh
so in the above example the output should be :
hello this is my word file and this is line number 1
jhasg djgha sdgasjhgdjasgh jdkh
sdhgfkjg sdjhgf sjkdghf sdhf
s hdg fjhsgd fjhgsdj gfj ksdgh
Because hello this is line
is more than 3 words, so the lines containing those words are deleted. Please note that the first line is not deleted because it is unique....
I tried to code myself and created a mess which created 200mb text file with the unlimited number of first line text. Anyways here is the code, dont execute it else you can end up having your hard disk full.
<?php
$fileA = fopen("names.txt", "r");
$fileB = fopen("anothernames.txt", "r");
$fileC = fopen("uniquenames.txt", "w");
while(!feof($fileA))
{
$line = fgets($fileA);
$words = explode(" ", $line);
$size = count($words);
while(!feof($fileA))
{
$line1 = fgets($fileB);
$words1 = explode(" ", $line1);
$size1 = count($words1);
$c=0;
for($i=0; $i<$size; $i++)
{
for($j=0; $j<$size1; $j++)
{
if($words[$i]==$words1[$j])
$c++;
}
}
if($c<3)
fwrite($fileC, $line);
}
}
fclose($fileA);
fclose($fileB);
fclose($fileC);
?>
Thanks
Upvotes: 1
Views: 509
Reputation: 20899
An easy approach would be the following:
file()
Example:
<?php
$lines = array("hello this is my word file and this is line number 1",
"hello this is second line and this is some text",
"hello this is third line and again some text",
"jhasg djgha sdgasjhgdjasgh jdkh",
"sdhgfkjg sdjhgf sjkdghf sdhf",
"s hdg fjhsgd fjhgsdj gfj ksdgh");
//$lines = file("path/to/file");
$result = array();
//build "count-per-word" array
foreach ($lines AS $line){
$words = explode(" ", $line);
foreach ($words AS $word){
$word = strtolower($word);
if (isset($result[$word]))
$result[$word][] = $line;
else
$result[$word] = array($line);
}
}
//Blacklist each sentence, containing a word appearing in 3 sentences.
$blacklist = array();
foreach ($result AS $word => $entries){
if (count($entries) >= 3){
foreach($entries AS $entry){
$blacklist[] = $entry;
}
}
}
//list all not blacklisted.
foreach ($lines AS $line){
if (!in_array($line, $blacklist))
echo $line."<br />";
}
?>
Output:
jhasg djgha sdgasjhgdjasgh jdkh
sdhgfkjg sdjhgf sjkdghf sdhf
s hdg fjhsgd fjhgsdj gfj ksdgh
Note, that this will also blacklist a single sentence containing 3 times the same word, such as "Foo Foo Foo bar".
To aovid this, check if the line is already "known" for a certain word before pushing it to the array:
foreach ($words AS $word){
if (isset($result[$word])){
if (!in_array($line, $result[$word])){
$result[$word][] = $line;
}
}else
$result[$word] = array($line);
}
Upvotes: 1
Reputation: 360742
Why not just array_intersect
?
php > $l1 = 'hello this is my word file and this is line number 1';
php > $l2 = 'hello this is second line and this is some text';
php > $a1 = explode(" ", $l1);
php > $a2 = explode(" ", $l2);
php > var_dump(array_intersect($a1, $a2));
array(7) {
[0]=>
string(5) "hello"
[1]=>
string(4) "this"
[2]=>
string(2) "is"
[6]=>
string(3) "and"
[7]=>
string(4) "this"
[8]=>
string(2) "is"
[9]=>
string(4) "line"
}
if (count of intersection >= 3) {
skip line
}
Or am I reading your "matching" too loosely?
Upvotes: 0
Reputation: 54
#second
while(!feof($fileA))
#should be
while(!feof($fileB))
and
if($c<3)
fwrite($fileC, $line);
#should
if($c<3){
fwrite($fileC, $line);
continue 2;
}
but
then compare that array which contains words of that line WITH all the words of next lines
makes only sence when you compare the file with itself!
EDIT:my post makes no sence at all, read note from prev post!
Upvotes: 0