Reputation: 135
Before anyone else checks, I am confident this is not a duplicate of the existing question of how to add a header in Unix to multiple files (the question is here: Adding header into multiple text files). This is more about optimisation of a solution I am currently using for this current issue.
I have numerous directories in which I have over 20000 files and for each file I want to add the same header.
What I have been doing is:
sed -i '1ichr\tpos\tref\talt\treffrq\tinfo\trs\tpval\teffalt\tgene' *.txt
Now, this does work exactly as I want it to, but there have been a couple of issues.
First is that this seems to be an extremely slow method of doing this and it can take a pretty long time to get through all 20K+ files.
Second, and more frustratingly, occasionally my connection to the server I am using has timed out during this long process meaning that the command won't finish running, so I end up with half the files having the header and half not. And if I started from the top again this would mean a number of the files would have the header twice so I actually have to go through a process of creating them again so I can add the header all at once.
So, what I am wondering is if there is a better/quicker solution to this problem. The question I linked above seems like it would actually be slower (given that it seems like there is more the command line needs to do at each file as it is going through a loop) and so doesn't seem like it would fix this.
Upvotes: 1
Views: 3089
Reputation: 212356
Don't use -i
. It confuses things when you get interrupted. Instead, use
mkdir -p ../output-dir
for file in *.txt; do
sed '1ichr\tpos\tref\talt\treffrq\tinfo\trs\tpval\teffalt\tgene' "$file" > ../output-dir/"$file"
done
When you're done, you can rename the directories if you wish. This doesn't address the connection issue (ThoriumBR's suggestion of nohup
is good for that), but when it happens you can recover state more easily.
Upvotes: 1
Reputation: 940
First, adding a header is slow. You have to move the entire file contents to add something at the start. Adding a trailer would be very fast.
Second, use nohup:
nohup - run a command immune to hangups, with output to a non-tty
Using nohup sed -i '1ichr\tpos\tref\talt\treffrq\tinfo\trs\tpval\teffalt\tgene' *.txt
will keep the command running on the background even if the server times you out.
Upvotes: 0