abs
abs

Reputation: 51

How to merge multiple text files by column?

I have 20 files in my directory, now like to merge the columns of the files by column 1. Examples of files are given below:

Note: column 1 (#rname) is not sorted but has a similar number of rows over the input files. Column 2 can be ignored from the input files when generating the output file.

File1:

#rname  startpos    endpos  numreads    covbases    coverage    meandepth   meanbaseq   meanmapq
CFLAU10s2|kraken:taxid|33189    1   37904   6   375 0.989341    0.0157503   35.6    31
CFLAU10s4|kraken:taxid|33189    1   37174   2   222 0.597192    0.00812396  35.7    45
CFLAU10s5|kraken:taxid|33189    1   37080   2   70  0.188781    0.00377562  35.7    28
CFLAU10s6|kraken:taxid|33189    1   36175   4   145 0.400829    0.00801659  36.9    25.5
CFLAU10s8|kraken:taxid|33189    1   35457   4   155 0.437149    0.00874298  37  25.5
CFLAU10s9|kraken:taxid|33189    1   34752   6   111 0.319406    0.0191644   36.2    26
CFLAU10s13|kraken:taxid|33189   1   32255   2   156 0.483646    0.00917687  35.2    20
CFLAU10s14|kraken:taxid|33189   1   31915   4   63  0.197399    0.00695598  36.7    26.5

file2:

#rname  startpos    endpos  numreads    covbases    coverage    meandepth   meanbaseq   meanmapq
CFLAU10s2|kraken:taxid|33189    1   26432   8   318 1.20309 0.0361683   35.9    35
CFLAU10s4|kraken:taxid|33189    1   26105   2   91  0.348592    0.00697184  35  37
CFLAU10s5|kraken:taxid|33189    1   25631   4   122 0.475986    0.00951972  35.7    26
CFLAU10s6|kraken:taxid|33189    1   25014   2   64  0.255857    0.00511713  34.7    28
CFLAU10s8|kraken:taxid|33189    1   24803   522 10847   43.7326 2.22175 36.3    32.8
CFLAU10s9|kraken:taxid|33189    1   24139   2   59  0.244418    0.00488835  37  43
CFLAU10s13|kraken:taxid|33189   1   23391   24  1215    5.19431 0.109273    36.1    35.5
CFLAU10s14|kraken:taxid|33189   1   22963   12  318 1.38484 0.0531289   36.8    15.3

expected output_file:

#rname  file1_endpos    file1_numreads  file1_covbases  file1_coverage  file1_meandepth file1_meanbaseq file1_meanmapq  file2_endpos    file2_numreads  file2_covbases  file2_coverage  file2_meandepth file2_meanbaseq file2_meanmapq
CFLAU10s2|kraken:taxid|33189    37904   6   375 0.989341    0.0157503   35.6    31  26432   8   318 1.20309 0.0361683   35.9    35
CFLAU10s4|kraken:taxid|33189    37174   2   222 0.597192    0.00812396  35.7    45  26105   2   91  0.348592    0.00697184  35  37
CFLAU10s5|kraken:taxid|33189    37080   2   70  0.188781    0.00377562  35.7    28  25631   4   122 0.475986    0.00951972  35.7    26
CFLAU10s6|kraken:taxid|33189    36175   4   145 0.400829    0.00801659  36.9    25.5    25014   2   64  0.255857    0.00511713  34.7    28
CFLAU10s8|kraken:taxid|33189    35457   4   155 0.437149    0.00874298  37  25.5    24803   522 10847   43.7326 2.22175 36.3    32.8
CFLAU10s9|kraken:taxid|33189    34752   6   111 0.319406    0.0191644   36.2    26  24139   2   59  0.244418    0.00488835  37  43
CFLAU10s13|kraken:taxid|33189   32255   2   156 0.483646    0.00917687  35.2    20  23391   24  1215    5.19431 0.109273    36.1    35.5
CFLAU10s14|kraken:taxid|33189   31915   4   63  0.197399    0.00695598  36.7    26.5    22963   12  318 1.38484 0.0531289   36.8    15.3

Upvotes: 0

Views: 1978

Answers (3)

sseLtaH
sseLtaH

Reputation: 11227

Using awk

$ awk 'NR==FNR{array[$1]=$3FS$4FS$5FS$6FS$7FS$8; next} { print $0,array[$1]}' file2 file1
#rname  startpos    endpos  numreads    covbases    coverage    meandepth   meanbaseq   meanmapq endpos numreads covbases coverage meandepth meanbaseq
CFLAU10s2|kraken:taxid|33189    1   37904   6   375 0.989341    0.0157503   35.6    31 26432 8 318 1.20309 0.0361683 35.9
CFLAU10s4|kraken:taxid|33189    1   37174   2   222 0.597192    0.00812396  35.7    45 26105 2 91 0.348592 0.00697184 35
CFLAU10s5|kraken:taxid|33189    1   37080   2   70  0.188781    0.00377562  35.7    28 25631 4 122 0.475986 0.00951972 35.7
CFLAU10s6|kraken:taxid|33189    1   36175   4   145 0.400829    0.00801659  36.9    25.5 25014 2 64 0.255857 0.00511713 34.7
CFLAU10s8|kraken:taxid|33189    1   35457   4   155 0.437149    0.00874298  37  25.5 24803 522 10847 43.7326 2.22175 36.3
CFLAU10s9|kraken:taxid|33189    1   34752   6   111 0.319406    0.0191644   36.2    26 24139 2 59 0.244418 0.00488835 37
CFLAU10s13|kraken:taxid|33189   1   32255   2   156 0.483646    0.00917687  35.2    20 23391 24 1215 5.19431 0.109273 36.1
CFLAU10s14|kraken:taxid|33189   1   31915   4   63  0.197399    0.00695598  36.7    26.5 22963 12 318 1.38484 0.0531289 36.8

Upvotes: 3

HOH_HOH
HOH_HOH

Reputation: 125

I think you can try the column command, and tell it to use tabs to separate columns

paste file1 file2 | column -s $'\t' -t

Upvotes: 0

Marco
Marco

Reputation: 1232

I don't know a single command to do this, but you can do it more programmatically like this and get the task done.

For this first run a

IFS="
"

kram@00w87:/tmp$ for each in `grep -v "#" file1`;do v=`grep \`echo "$each"|cut -d"|" -f1\` file2`;echo "$each $v";done
CFLAU10s2|kraken:taxid|33189    1   37904   6   375 0.989341    0.0157503   35.6    31 CFLAU10s2|kraken:taxid|33189    1   26432   8   318 1.20309 0.0361683   35.9    35
CFLAU10s4|kraken:taxid|33189    1   37174   2   222 0.597192    0.00812396  35.7    45 CFLAU10s4|kraken:taxid|33189    1   26105   2   91  0.348592    0.00697184  35  37
CFLAU10s5|kraken:taxid|33189    1   37080   2   70  0.188781    0.00377562  35.7    28 CFLAU10s5|kraken:taxid|33189    1   25631   4   122 0.475986    0.00951972  35.7    26
CFLAU10s6|kraken:taxid|33189    1   36175   4   145 0.400829    0.00801659  36.9    25.5 CFLAU10s6|kraken:taxid|33189    1   25014   2   64  0.255857    0.00511713  34.7    28
CFLAU10s8|kraken:taxid|33189    1   35457   4   155 0.437149    0.00874298  37  25.5 CFLAU10s8|kraken:taxid|33189    1   24803   522 10847   43.7326 2.22175 36.3    32.8
CFLAU10s9|kraken:taxid|33189    1   34752   6   111 0.319406    0.0191644   36.2    26 CFLAU10s9|kraken:taxid|33189    1   24139   2   59  0.244418    0.00488835  37  43
CFLAU10s13|kraken:taxid|33189   1   32255   2   156 0.483646    0.00917687  35.2    20 CFLAU10s13|kraken:taxid|33189   1   23391   24  1215    5.19431 0.109273    36.1    35.5
CFLAU10s14|kraken:taxid|33189   1   31915   4   63  0.197399    0.00695598  36.7    26.5 CFLAU10s14|kraken:taxid|33189   1   22963   12  318 1.38484 0.0531289   36.8    15.3

then you just need to add the header and that's it!

This command will merge it all, then you need to drop columns using cut for example, or even awk. I think awk will do it quicker, this is just a preview.

for each in `grep -v "#" file1`;do v=`grep \`echo "$each"|cut -d"|" -f1\` file2`;echo "$each $v";done | awk '{print $1" "$8" "$9}'
CFLAU10s2|kraken:taxid|33189 35.6 31
CFLAU10s4|kraken:taxid|33189 35.7 45
CFLAU10s5|kraken:taxid|33189 35.7 28
CFLAU10s6|kraken:taxid|33189 36.9 25.5
CFLAU10s8|kraken:taxid|33189 37 25.5
CFLAU10s9|kraken:taxid|33189 36.2 26
CFLAU10s13|kraken:taxid|33189 35.2 20
CFLAU10s14|kraken:taxid|33189 36.7 26.5

Upvotes: 1

Related Questions