Reputation: 191
There are two tables as below:-
In short, this process will allow the user to find and replace keywords based on the second table and only in the documents required.
The algorithm works as below:-
The process works fine and gives desired results too.
The problem begins when the data increases. As for now, there are around 50,000 entries in the first table and thus the same number of files on the server.
The second table contains around 15000 records of find and replaces keywords with long strings comma separated with documents id.
For such amount of data, this process will run for days and that should not happen.
For database MySQL 5.5 is used and the backend is PHP(Laravel 5.4). OS is centos 7 with nginx web server.
Is there a way to make this process smooth and less time-consuming? Any help is appreciated.
Upvotes: 0
Views: 191
Reputation: 108816
php has a function shell_exec($shellCommand);
You may wish to use the gnu/linux shell-accessible program called sed (stream editor) to do this substitution rather than slurping each file into php then writing it out again.
For example,
$result = shell_exec
("cd what/ever/directory; sed 's/this/that/g' inputfile > outputfile");
will read what/ever/directory/inputfile
, change all the this
strings to that
, and write the result into what/ever/directory/outputfile
. And, it will do it very quickly compared to php.
Edit: Why does this approach save a lot of time?
sed
have been around for decades and are highly optimized. sed
uses far less processing power--far fewer cpu cycles--than php to do what it does. So the transformation of the files is faster.sed
is a stream editor. It reads, transforms, and writes all in parallel. To get the most out of this approach, you'll need to get your php program to write more complex editing commands than 's/this/that/g'
. You'll want to do multiple substitutions in a single sed
run. You can do that by concatenating editing instructions like this example:
's/this/that/; s/blue/azul/g; s/red/rojo/g'
A single shell command can be around 100K characters in length, so you probably won't hit limits on the length of those editing instructions.
By suggesting the use of sed
I do suggest using a differnt algorithm.
Upvotes: 0