Reputation: 479
If I have an input file below, is there any command/way in Linux to convert this into my desired file as followed?
Input file:
Column_1 Column_2
scaffold_A SNP_marker1
scaffold_A SNP_marker2
scaffold_A SNP_marker3
scaffold_A SNP_marker4
scaffold_B SNP_marker5
scaffold_B SNP_marker6
scaffold_B SNP_marker7
scaffold_C SNP_marker8
scaffold_A SNP_marker9
scaffold_A SNP_marker10
Desired Output file:
Column_1 Column_2
scaffold_A SNP_marker1;SNP_marker2;SNP_marker3;SNP_marker4
scaffold_B SNP_marker5;SNP_marker6;SNP_marker7
scaffold_C SNP_marker8
scaffold_A SNP_marker9;SNP_marker10
I was thinking of using grep, uniq, etc, but still couldn't figure out how to get this done.
Upvotes: 0
Views: 356
Reputation: 40688
If you don't mind using Python, it has itertools.groupby
, which serves this purpose:
# file: comebine.py
import itertools
with open('data.txt') as f:
data = [row.split() for row in f]
for column1, rows_group in itertools.groupby(data, key=lambda row: row[0]):
print column1, ';'.join(column2 for column1, column2 in rows_group)
Save this script as combine.py. Assume that your input file is in data.txt, run it to get your desired output:
python combine.py
with open(...)
block is data
, a list of rows, each row itself is a list of columns.itertools.groupby
function takes in an iterable, in this case, a list. You tell it how to group lines together using a key, which is column1.Upvotes: 0
Reputation: 6240
Also you could try the following solution in bash:
cat input.txt | while read L; do y=`echo $L | cut -f1 -d' '`; { test "$x" = "$y" && echo -n ";`echo $L | cut -f2 -d' '`"; } || { x="$y";echo -en "\n$L"; }; done
or in human more-readable form to review:
cat input.txt | while read L;
do
y=`echo $L | cut -f1 -d' '`;
{
test "$x" = "$y" && echo -n ";`echo $L | cut -f2 -d' '`";
} ||
{
x="$y";echo -en "\n$L";
};
done
Note, that the nice formatted output in result of the script performing is based on the bash echo
command.
Upvotes: 0
Reputation: 1
awk solution within a bash script
#!/bin/bash
awk '
BEGIN{
str = ""
}
{
if ( str != $1 ) {
if ( NR != 1 ){
printf("\n")
}
str = $1
printf("%s\t%s",$1,$2)
} else if ( str == $1 ) {
printf(";%s",$2)
}
}
END{
printf("\n")
}' your_file.txt
Upvotes: 0
Reputation: 51797
python solution (assuming filename passed in on command line)
from __future__ import print_function #not needed with Python3
with open('infile') as infile, open('outfile', 'w') as outfile:
outfile.write(infile.readline()) # transfer the header
col_one, col_two = infile.readline().split()
col_two = [col_two] # make it a list
for line in infile:
data = line.split()
if col_one != data[0]:
print("{}\t{}".format(col_one, ';'.join(col_two)), file=outfile)
col_one = data[0]
col_two = [data[1]]
else:
col_two.append(data[1])
print("{}\t{}".format(col_one, ';'.join(col_two)), file=outfile)
Upvotes: 2
Reputation: 241788
Perl solution:
perl -lane 'sub output {
print "$last\t", join ";", @buff;
}
$last //= $F[0];
if ($F[0] ne $last) {
output();
undef @buff;
$last = $F[0];
}
push @buff, $F[1];
}{ output();'
Upvotes: 2