Reputation: 343
I have a folder with about 2 million files in it. I need to run the following commands:
sed -i 's/<title>/<item><title>/g;s/rel="nofollow"//g;s/<\/a> •/]]><\/wp:meta_value><\/wp:postmeta><content:encoded><![CDATA[/g;s/By <a href="http:\/\/www.website.com\/authors.*itemprop="author">/<wp:postmeta><wp:meta_key><![CDATA[custom_author]]><\/wp:meta_key><wp:meta_value><![CDATA[/g' /home/testing/*
sed -i '$a]]></content:encoded><wp:status><![CDATA[draft]]></wp:status><wp:post_type><![CDATA[post]]></wp:post_type><dc:creator><![CDATA[Database]]></dc:creator></item>\' /home/testing/*
awk -i inplace 1 ORS=' ' /home/testing/*
The problem I'm having is that when I run the first command, it cycles through all 2 million files, then I move on to the second command and so on. The problem is that I'm basically having to open files 6 million times in total.
I'd prefer that when each file is opened, all 3 commands are run on it and then it moves on to the next. Hopefully that makes sense.
Upvotes: 0
Views: 567
Reputation: 437933
Assuming that your files are small enough for a single file to fit into memory as a whole (and assuming GNU sed
, which your use of -i
without an option-argument implies):
sed -i -e ':a;$!{N;ba}; s/.../.../g; ...; $a...' -e 's/\n/ /g' /home/testing/*
s/.../.../g; ...;
and $a...
in the command above represent your actual substitution and append commands.
:a;$!{N;ba};
reads each input file as a whole, and then performs the desired substitutions, appending, and replacement of all newlines with a single space each.[1]
This allows you to make do with a single sed
command per input file.
[1] Your awk 1 ORS=' '
command actually creates output with a trailing space instead of a newline. By contrast, 's/\n/ /g'
applied to the whole input file will only place a space between lines, and terminate the overall file with a newline (assuming the input file ended in one).
Upvotes: 0
Reputation: 203532
You can do everything in one awk command as something like:
awk -i inplace -v ORS=' ' '{
gsub(/<title>/,"<item><title>")
gsub(/rel="nofollow"/,"")
gsub(/<\/a> •/,"]]><\/wp:meta_value><\/wp:postmeta><content:encoded><![CDATA[")
gsub(/By <a href="http:\/\/www.website.com\/authors.*itemprop="author">/,"<wp:postmeta><wp:meta_key><![CDATA[custom_author]]><\/wp:meta_key><wp:meta_value><![CDATA[")
print $0 "]]></content:encoded><wp:status><![CDATA[draft]]></wp:status><wp:post_type><![CDATA[post]]></wp:post_type><dc:creator><![CDATA[Database]]></dc:creator></item>"
}' /home/testing/*
but that doesn't mean it's necessarily the best way to do what you want.
The above relies on my correctly interpreting what your commands are doing and is obviously untested since you didn't provide any sample input and expected output. It also still relies on GNU awk for -i inplace
like your original script did.
Upvotes: 1