amine
amine

Reputation: 479

Combine rows in linux

If I have an input file below, is there any command/way in Linux to convert this into my desired file as followed?

Input file:

Column_1     Column_2  
scaffold_A   SNP_marker1
scaffold_A   SNP_marker2
scaffold_A   SNP_marker3
scaffold_A   SNP_marker4
scaffold_B   SNP_marker5
scaffold_B   SNP_marker6
scaffold_B   SNP_marker7
scaffold_C   SNP_marker8
scaffold_A   SNP_marker9
scaffold_A   SNP_marker10

Desired Output file:

Column_1     Column_2  
scaffold_A   SNP_marker1;SNP_marker2;SNP_marker3;SNP_marker4
scaffold_B   SNP_marker5;SNP_marker6;SNP_marker7
scaffold_C   SNP_marker8
scaffold_A   SNP_marker9;SNP_marker10

I was thinking of using grep, uniq, etc, but still couldn't figure out how to get this done.

Upvotes: 0

Views: 356

Answers (5)

Hai Vu
Hai Vu

Reputation: 40688

If you don't mind using Python, it has itertools.groupby, which serves this purpose:

# file: comebine.py
import itertools

with open('data.txt') as f:
    data = [row.split() for row in f]

for column1, rows_group in itertools.groupby(data, key=lambda row: row[0]):
    print column1, ';'.join(column2 for column1, column2 in rows_group)

Save this script as combine.py. Assume that your input file is in data.txt, run it to get your desired output:

python combine.py

Discussion

  • The result of the with open(...) block is data, a list of rows, each row itself is a list of columns.
  • The itertools.groupby function takes in an iterable, in this case, a list. You tell it how to group lines together using a key, which is column1.
  • rows_group is a list of rows that share the same column1

Upvotes: 0

rook
rook

Reputation: 6240

Also you could try the following solution in bash:

cat input.txt | while read L; do y=`echo $L | cut -f1 -d' '`; { test "$x" = "$y" && echo -n ";`echo $L | cut -f2 -d' '`"; } || { x="$y";echo -en "\n$L"; }; done

or in human more-readable form to review:

cat input.txt | while read L;
do
  y=`echo $L | cut -f1 -d' '`;
  {
    test "$x" = "$y" && echo -n ";`echo $L | cut -f2 -d' '`";
  } || 
  {
    x="$y";echo -en "\n$L"; 
  };
done

Note, that the nice formatted output in result of the script performing is based on the bash echo command.

Upvotes: 0

ds_
ds_

Reputation: 1

awk solution within a bash script

#!/bin/bash 

awk '
BEGIN{
    str = ""
}
{
    if ( str != $1 ) {
        if ( NR != 1 ){
            printf("\n")
        }
        str = $1
        printf("%s\t%s",$1,$2)
    } else if ( str == $1 ) {
        printf(";%s",$2)
    }
}
END{
        printf("\n")
}' your_file.txt

Upvotes: 0

Wayne Werner
Wayne Werner

Reputation: 51797

python solution (assuming filename passed in on command line)

from __future__ import print_function #not needed with Python3
with open('infile') as infile, open('outfile', 'w') as outfile:
    outfile.write(infile.readline()) # transfer the header
    col_one, col_two = infile.readline().split()
    col_two = [col_two] # make it a list
    for line in infile:
        data = line.split()
        if col_one != data[0]:
            print("{}\t{}".format(col_one, ';'.join(col_two)), file=outfile)
            col_one = data[0]
            col_two = [data[1]]
        else:
            col_two.append(data[1])
    print("{}\t{}".format(col_one, ';'.join(col_two)), file=outfile)

Upvotes: 2

choroba
choroba

Reputation: 241788

Perl solution:

perl -lane 'sub output {
                print "$last\t", join ";", @buff;
            }
            $last //= $F[0];
            if ($F[0] ne $last) {
               output();
               undef @buff;
               $last = $F[0];
            }
            push @buff, $F[1];
            }{ output();'

Upvotes: 2

Related Questions