Reputation: 523
Well, my question is very simple, but I didn't find the proper answer in nowhere. What I need is to find a way that reads a .txt file, and if there's a duplicated line, remove ALL of them, not preserving one. For example, in a .txt contains the following:
1234
1233
1232
1234
The output should be:
1233
1232
Because the code has to delete the duplicated line, all of them. I searched all the web, but it always point to answers that removes duplicated lines but preserve one of them, like this, this or that.
I'm afraid that the only way to do this is to read the x line and check the whole .txt, if it finds an equal result, delete, and delete the x line too. If not, change to the next line. But the .txt file I'm checking has 50 milions lines (~900Mb), I don't know how much memory I need to do this kind of task, so I appreciate some help here.
Upvotes: 0
Views: 1029
Reputation: 780843
Read the file line by line, and use the line contents as the key of an associative array whose values are a count of the number of times the line appears. After you're done, write out all the lines whose value is only 1. This will require as much memory as all the unique lines.
$lines = array();
$fd = fopen("inputfile.txdt", "r");
while ($line = fgets($fd)) {
$line = rtrim($line, "\r\n"); // ignore the newline
if (array_key_exists($line, $lines)) {
$lines[$line]++;
} else {
$lines[$line] = 1;
}
}
fclose($fd);
$fd = fopen("outputfile.txt", "w");
foreach ($lines as $line => $count) {
if ($count == 1) {
fputs($fd, "$line" . PHP_EOL); // add the newlines back
}
}
Upvotes: 3
Reputation: 223
I think I have a solution far more elegant:
$array = array('1', '1', '2', '2', '3', '4'); // array with some unique values, some not unique
$array_count_result = array_count_values($array); // count values occurences
$result = array_keys(array_filter($array_count_result, function ($value) { return ($value == 1); })); // filter and isolate only unique values
print_r($result);
gives:
Array
(
[0] => 3
[1] => 4
)
Upvotes: 0
Reputation: 4334
I doubt there is one and only one function that does all of what you want to do. So, this breaks it down into steps...
First, can we load a file directly into an array? See the documentation for the file
command
$lines = file('mytextfile.txt');
Now, I have all of the lines in an array. I want to count how many of each entry I have. See the documentation for the array_count_values
command.
$counts = array_count_values($lines);
Now, I can easily loop through the array and delete any entries where the count>1
foreach($counts as $value=>$cnt)
if($cnt>1)
unset($counts[$value]);
Now, I can turn the array keys (which are the values) into an array.
$nondupes = array_keys($counts);
Finally, I can write the contents out to a file.
file_put_contents('myoutputfile.txt', $nondupes);
Upvotes: 0