Reputation: 55
I am trying to duplicate all the columns in my file. My file is very large consisting of 600,000 columns and 300 rows - tab separated. Here I am just showing a small part of my file on which I am trying to work.
rs 71_1203 71_1299 71_6634
40896 3 3 4
70786 2 2 4
116950 2 2 4
5891 3 3 4
6254 3 2 4
89308 2 2 4
116953 2 2 4
116956 2 2 4
20709 3 2 4
12524 2 2 4
12603 2 2 4
21074 2 2 1
42672 2 2 4
40972 2 2 4
21727 3 2 4
22163 2 2 4
22417 2 2 4
41216 2 2 4
41374 2 2 4
now I want my file to look like this:
rs rs 71_1203 71_1203 71_1299 71_1299 71_6634 71_6634
40896 40896 3 3 3 3 4 4
70786 70786 2 2 2 2 4 4
116950 116950 2 2 2 2 4 4
5891 5891 3 3 3 3 4 4
6254 6254 3 3 2 2 4 4
89308 89308 2 2 2 2 4 4
116953 116953 2 2 2 2 4 4
116956 116956 2 2 2 2 4 4
20709 20709 3 3 2 2 4 4
12524 12524 2 2 2 2 4 4
12603 12603 2 2 2 2 4 4
21074 21074 2 2 2 2 1 1
42672 42672 2 2 2 2 4 4
40972 40972 2 2 2 2 4 4
21727 21727 3 3 2 2 4 4
22163 22163 2 2 2 2 4 4
22417 22417 2 2 2 2 4 4
41216 41216 2 2 2 2 4 4
41374 41374 2 2 2 2 4 4
All columns duplicated. I did this using awk command -
awk 'BEGIN{FS=OFS="\t"} {$1 = $1 OFS $1} 1' try.txt |
awk 'BEGIN{FS=OFS="\t"} {$3 = $3 OFS $3} 1' |
awk 'BEGIN{FS=OFS="\t"} {$5 = $5 OFS $5} 1' |
awk 'BEGIN{FS=OFS="\t"} {$7 = $7 OFS $7} 1'
I know this command is good when we small file but I with my file having 600,000 columns this way will not work for sure.
Can someone help me as to how can I do this is an easy way?
Thanks a lot for helping
Upvotes: 0
Views: 57
Reputation: 8711
You can try Perl one-liner
perl -lpe 's/$/\t/g; s/(\S+\s*)/$1$1/g ' input_file
with the given inputs
$ cat rhkss.txt
rs 71_1203 71_1299 71_6634
40896 3 3 4
70786 2 2 4
116950 2 2 4
5891 3 3 4
6254 3 2 4
89308 2 2 4
116953 2 2 4
116956 2 2 4
20709 3 2 4
12524 2 2 4
12603 2 2 4
21074 2 2 1
42672 2 2 4
40972 2 2 4
21727 3 2 4
22163 2 2 4
22417 2 2 4
41216 2 2 4
41374 2 2 4
$ perl -lpe 's/$/\t/g; s/(\S+\s*)/$1$1/g ' rhkss.txt
rs rs 71_1203 71_1203 71_1299 71_1299 71_6634 71_6634
40896 40896 3 3 3 3 4 4
70786 70786 2 2 2 2 4 4
116950 116950 2 2 2 2 4 4
5891 5891 3 3 3 3 4 4
6254 6254 3 3 2 2 4 4
89308 89308 2 2 2 2 4 4
116953 116953 2 2 2 2 4 4
116956 116956 2 2 2 2 4 4
20709 20709 3 3 2 2 4 4
12524 12524 2 2 2 2 4 4
12603 12603 2 2 2 2 4 4
21074 21074 2 2 2 2 1 1
42672 42672 2 2 2 2 4 4
40972 40972 2 2 2 2 4 4
21727 21727 3 3 2 2 4 4
22163 22163 2 2 2 2 4 4
22417 22417 2 2 2 2 4 4
41216 41216 2 2 2 2 4 4
41374 41374 2 2 2 2 4 4
$
Upvotes: 2
Reputation: 37414
Using awk:
$ awk -v OFS="\t" '{
for(i=NF*2;i>1;i--) # from NF*2 down to 2
$i=((j=i/2)==int(j)?$j:$(++j)) # $i=$(ceil(1/2))
}1' file
Output:
rs rs 71_1203 71_1203 71_1299 71_1299 71_6634 71_6634
40896 40896 3 3 3 3 4 4
70786 70786 2 2 2 2 4 4
...
Upvotes: 0
Reputation: 2011
Python approach:
with open('input.txt') as f:
text = f.readlines()
duplicated_text = ['\t'.join([word + '\t' + word for word in line.split('\t')]) for line in text]
with open('output.txt', 'w') as f:
f.write('\n'.join(duplicated_text))
Upvotes: 0
Reputation: 785286
You may use this awk
fo duplicate all the columns separated by a tab:
awk 'BEGIN{FS=OFS="\t"} {
for (i=1; i<=NF; i++) printf "%s%s", $i OFS, $i (i < NF ? OFS : RS)}' file
rs rs 71_1203 71_1203 71_1299 71_1299 71_6634 71_6634
40896 40896 3 3 3 3 4 4
70786 70786 2 2 2 2 4 4
116950 116950 2 2 2 2 4 4
5891 5891 3 3 3 3 4 4
6254 6254 3 3 2 2 4 4
89308 89308 2 2 2 2 4 4
116953 116953 2 2 2 2 4 4
116956 116956 2 2 2 2 4 4
20709 20709 3 3 2 2 4 4
12524 12524 2 2 2 2 4 4
12603 12603 2 2 2 2 4 4
21074 21074 2 2 2 2 1 1
42672 42672 2 2 2 2 4 4
40972 40972 2 2 2 2 4 4
21727 21727 3 3 2 2 4 4
22163 22163 2 2 2 2 4 4
22417 22417 2 2 2 2 4 4
41216 41216 2 2 2 2 4 4
41374 41374 2 2 2 2 4 4
Upvotes: 1