user2997397
user2997397

Reputation: 59

SED command to change the header

Well, I have about 114 files that I want to join side-by-side based on the 1st column that each file shares, which's the ID number. Each file consists of 2 columns and over 400000 lines. I used write.table to join those tables together in one table and I got X's in my header. For example, my header should be like:

ID 1_sample1 2_sample2 3_sample3

But I get it like this:

ID X1_sample1 X2_sample2 X3_sample3

I read about this problem and found out the check.names get rid of this problem, but in my case when I use check.names I get the following error:

"unused argument (check.name = F)"

Thus, I decided to use sed to fix the problem, it actually works great, BUT it joins the 2nd line and the 1st line. For instance, my 1st column and second column should be something like this:

ID 1_sample1 2_sample2 3_sample

cg123 .0235 2.156 -5.546

But I get the following instead:

ID 1_sample1 2_sample2 3_sample cg123 .0235 2.156 -5.546

Can any one check this code for me, please. I might've done something wrong to not get each line separated from the other.

head -n 1 inFILE | tr "\t" "\n" | sed -e 's/^X//g' | sed -e 's/\./-/' | sed -e 's/\./(/' |sed -e 's/\./)/' | tr "\n" "\t" > outFILE
tail -n +2 beta.norm.txt >> outFILE

Upvotes: 1

Views: 2917

Answers (1)

Floris
Floris

Reputation: 46415

If your data is tab delimited, the simple fix would be

sed '1,1s/\tX/\t/g' < inputfile > outputfile

1,1     only operate on the range "line 1 to line 1"
\tX     find tab followed by X
/\t/    replace with tab
g       all occurrences

It does seem as though your original attempt does more than just strip the X - it also changes successive dots to (-) but you don't show in your example why you need that. The reason your code joins the first two lines is that you only replace \n with \t in your last tr command - which leaves you with no \n at the end of the line.

You need to attach a \n at the end of your first line before concatenating lines 2 and beyond with your second command. Experiment with

head -n 1 inFILE | tr "\t" "\n" | sed -e 's/^X//g' | sed -e 's/\./-/' | sed -e 's/\./(/' |sed -e 's/\./)/' | tr "\n" "\t" > outFILE
echo "\n" >> outFile
tail -n +2 beta.norm.txt >> outFILE

whether that works depends on your OS. There are other ways to add a newline...

edit using awk is probably much cleaner - for example

awk '(NR==1){gsub(" X"," ", $0);}{print;}' inputFile > outputFile

Explanation:

(NR==1)                for the first line only (record number == 1) do:
{gsub(" X","", $0);}   do a global substitution of "space followed by X", with "space"

                       for all lines (including the one that was just modified) do:
{print;}'              print the whole line

Upvotes: 2

Related Questions