Reputation: 51
I have 20 files in my directory, now like to merge the columns of the files by column 1. Examples of files are given below:
Note: column 1 (#rname) is not sorted but has a similar number of rows over the input files. Column 2 can be ignored from the input files when generating the output file.
File1:
#rname startpos endpos numreads covbases coverage meandepth meanbaseq meanmapq
CFLAU10s2|kraken:taxid|33189 1 37904 6 375 0.989341 0.0157503 35.6 31
CFLAU10s4|kraken:taxid|33189 1 37174 2 222 0.597192 0.00812396 35.7 45
CFLAU10s5|kraken:taxid|33189 1 37080 2 70 0.188781 0.00377562 35.7 28
CFLAU10s6|kraken:taxid|33189 1 36175 4 145 0.400829 0.00801659 36.9 25.5
CFLAU10s8|kraken:taxid|33189 1 35457 4 155 0.437149 0.00874298 37 25.5
CFLAU10s9|kraken:taxid|33189 1 34752 6 111 0.319406 0.0191644 36.2 26
CFLAU10s13|kraken:taxid|33189 1 32255 2 156 0.483646 0.00917687 35.2 20
CFLAU10s14|kraken:taxid|33189 1 31915 4 63 0.197399 0.00695598 36.7 26.5
file2:
#rname startpos endpos numreads covbases coverage meandepth meanbaseq meanmapq
CFLAU10s2|kraken:taxid|33189 1 26432 8 318 1.20309 0.0361683 35.9 35
CFLAU10s4|kraken:taxid|33189 1 26105 2 91 0.348592 0.00697184 35 37
CFLAU10s5|kraken:taxid|33189 1 25631 4 122 0.475986 0.00951972 35.7 26
CFLAU10s6|kraken:taxid|33189 1 25014 2 64 0.255857 0.00511713 34.7 28
CFLAU10s8|kraken:taxid|33189 1 24803 522 10847 43.7326 2.22175 36.3 32.8
CFLAU10s9|kraken:taxid|33189 1 24139 2 59 0.244418 0.00488835 37 43
CFLAU10s13|kraken:taxid|33189 1 23391 24 1215 5.19431 0.109273 36.1 35.5
CFLAU10s14|kraken:taxid|33189 1 22963 12 318 1.38484 0.0531289 36.8 15.3
expected output_file:
#rname file1_endpos file1_numreads file1_covbases file1_coverage file1_meandepth file1_meanbaseq file1_meanmapq file2_endpos file2_numreads file2_covbases file2_coverage file2_meandepth file2_meanbaseq file2_meanmapq
CFLAU10s2|kraken:taxid|33189 37904 6 375 0.989341 0.0157503 35.6 31 26432 8 318 1.20309 0.0361683 35.9 35
CFLAU10s4|kraken:taxid|33189 37174 2 222 0.597192 0.00812396 35.7 45 26105 2 91 0.348592 0.00697184 35 37
CFLAU10s5|kraken:taxid|33189 37080 2 70 0.188781 0.00377562 35.7 28 25631 4 122 0.475986 0.00951972 35.7 26
CFLAU10s6|kraken:taxid|33189 36175 4 145 0.400829 0.00801659 36.9 25.5 25014 2 64 0.255857 0.00511713 34.7 28
CFLAU10s8|kraken:taxid|33189 35457 4 155 0.437149 0.00874298 37 25.5 24803 522 10847 43.7326 2.22175 36.3 32.8
CFLAU10s9|kraken:taxid|33189 34752 6 111 0.319406 0.0191644 36.2 26 24139 2 59 0.244418 0.00488835 37 43
CFLAU10s13|kraken:taxid|33189 32255 2 156 0.483646 0.00917687 35.2 20 23391 24 1215 5.19431 0.109273 36.1 35.5
CFLAU10s14|kraken:taxid|33189 31915 4 63 0.197399 0.00695598 36.7 26.5 22963 12 318 1.38484 0.0531289 36.8 15.3
Upvotes: 0
Views: 1978
Reputation: 11227
Using awk
$ awk 'NR==FNR{array[$1]=$3FS$4FS$5FS$6FS$7FS$8; next} { print $0,array[$1]}' file2 file1
#rname startpos endpos numreads covbases coverage meandepth meanbaseq meanmapq endpos numreads covbases coverage meandepth meanbaseq
CFLAU10s2|kraken:taxid|33189 1 37904 6 375 0.989341 0.0157503 35.6 31 26432 8 318 1.20309 0.0361683 35.9
CFLAU10s4|kraken:taxid|33189 1 37174 2 222 0.597192 0.00812396 35.7 45 26105 2 91 0.348592 0.00697184 35
CFLAU10s5|kraken:taxid|33189 1 37080 2 70 0.188781 0.00377562 35.7 28 25631 4 122 0.475986 0.00951972 35.7
CFLAU10s6|kraken:taxid|33189 1 36175 4 145 0.400829 0.00801659 36.9 25.5 25014 2 64 0.255857 0.00511713 34.7
CFLAU10s8|kraken:taxid|33189 1 35457 4 155 0.437149 0.00874298 37 25.5 24803 522 10847 43.7326 2.22175 36.3
CFLAU10s9|kraken:taxid|33189 1 34752 6 111 0.319406 0.0191644 36.2 26 24139 2 59 0.244418 0.00488835 37
CFLAU10s13|kraken:taxid|33189 1 32255 2 156 0.483646 0.00917687 35.2 20 23391 24 1215 5.19431 0.109273 36.1
CFLAU10s14|kraken:taxid|33189 1 31915 4 63 0.197399 0.00695598 36.7 26.5 22963 12 318 1.38484 0.0531289 36.8
Upvotes: 3
Reputation: 125
I think you can try the column command, and tell it to use tabs to separate columns
paste file1 file2 | column -s $'\t' -t
Upvotes: 0
Reputation: 1232
I don't know a single command to do this, but you can do it more programmatically like this and get the task done.
For this first run a
IFS="
"
kram@00w87:/tmp$ for each in `grep -v "#" file1`;do v=`grep \`echo "$each"|cut -d"|" -f1\` file2`;echo "$each $v";done
CFLAU10s2|kraken:taxid|33189 1 37904 6 375 0.989341 0.0157503 35.6 31 CFLAU10s2|kraken:taxid|33189 1 26432 8 318 1.20309 0.0361683 35.9 35
CFLAU10s4|kraken:taxid|33189 1 37174 2 222 0.597192 0.00812396 35.7 45 CFLAU10s4|kraken:taxid|33189 1 26105 2 91 0.348592 0.00697184 35 37
CFLAU10s5|kraken:taxid|33189 1 37080 2 70 0.188781 0.00377562 35.7 28 CFLAU10s5|kraken:taxid|33189 1 25631 4 122 0.475986 0.00951972 35.7 26
CFLAU10s6|kraken:taxid|33189 1 36175 4 145 0.400829 0.00801659 36.9 25.5 CFLAU10s6|kraken:taxid|33189 1 25014 2 64 0.255857 0.00511713 34.7 28
CFLAU10s8|kraken:taxid|33189 1 35457 4 155 0.437149 0.00874298 37 25.5 CFLAU10s8|kraken:taxid|33189 1 24803 522 10847 43.7326 2.22175 36.3 32.8
CFLAU10s9|kraken:taxid|33189 1 34752 6 111 0.319406 0.0191644 36.2 26 CFLAU10s9|kraken:taxid|33189 1 24139 2 59 0.244418 0.00488835 37 43
CFLAU10s13|kraken:taxid|33189 1 32255 2 156 0.483646 0.00917687 35.2 20 CFLAU10s13|kraken:taxid|33189 1 23391 24 1215 5.19431 0.109273 36.1 35.5
CFLAU10s14|kraken:taxid|33189 1 31915 4 63 0.197399 0.00695598 36.7 26.5 CFLAU10s14|kraken:taxid|33189 1 22963 12 318 1.38484 0.0531289 36.8 15.3
then you just need to add the header and that's it!
This command will merge it all, then you need to drop columns using cut for example, or even awk. I think awk will do it quicker, this is just a preview.
for each in `grep -v "#" file1`;do v=`grep \`echo "$each"|cut -d"|" -f1\` file2`;echo "$each $v";done | awk '{print $1" "$8" "$9}'
CFLAU10s2|kraken:taxid|33189 35.6 31
CFLAU10s4|kraken:taxid|33189 35.7 45
CFLAU10s5|kraken:taxid|33189 35.7 28
CFLAU10s6|kraken:taxid|33189 36.9 25.5
CFLAU10s8|kraken:taxid|33189 37 25.5
CFLAU10s9|kraken:taxid|33189 36.2 26
CFLAU10s13|kraken:taxid|33189 35.2 20
CFLAU10s14|kraken:taxid|33189 36.7 26.5
Upvotes: 1