Sabor117
Sabor117

Reputation: 135

The best way in Unix to add a header to multiple files in a directory?

Before anyone else checks, I am confident this is not a duplicate of the existing question of how to add a header in Unix to multiple files (the question is here: Adding header into multiple text files). This is more about optimisation of a solution I am currently using for this current issue.

I have numerous directories in which I have over 20000 files and for each file I want to add the same header.

What I have been doing is:

sed -i '1ichr\tpos\tref\talt\treffrq\tinfo\trs\tpval\teffalt\tgene' *.txt

Now, this does work exactly as I want it to, but there have been a couple of issues.

First is that this seems to be an extremely slow method of doing this and it can take a pretty long time to get through all 20K+ files.

Second, and more frustratingly, occasionally my connection to the server I am using has timed out during this long process meaning that the command won't finish running, so I end up with half the files having the header and half not. And if I started from the top again this would mean a number of the files would have the header twice so I actually have to go through a process of creating them again so I can add the header all at once.

So, what I am wondering is if there is a better/quicker solution to this problem. The question I linked above seems like it would actually be slower (given that it seems like there is more the command line needs to do at each file as it is going through a loop) and so doesn't seem like it would fix this.

Upvotes: 1

Views: 3089

Answers (2)

William Pursell
William Pursell

Reputation: 212356

Don't use -i. It confuses things when you get interrupted. Instead, use

mkdir -p ../output-dir
for file in *.txt; do 
  sed '1ichr\tpos\tref\talt\treffrq\tinfo\trs\tpval\teffalt\tgene' "$file" > ../output-dir/"$file"
done

When you're done, you can rename the directories if you wish. This doesn't address the connection issue (ThoriumBR's suggestion of nohup is good for that), but when it happens you can recover state more easily.

Upvotes: 1

ThoriumBR
ThoriumBR

Reputation: 940

First, adding a header is slow. You have to move the entire file contents to add something at the start. Adding a trailer would be very fast.

Second, use nohup:

nohup - run a command immune to hangups, with output to a non-tty

Using nohup sed -i '1ichr\tpos\tref\talt\treffrq\tinfo\trs\tpval\teffalt\tgene' *.txt will keep the command running on the background even if the server times you out.

Upvotes: 0

Related Questions