Roy
Roy

Reputation: 743

unix bash: separating a specific column into multiple columns

I have a tab delimited file with three columns. Each of the row in the 3rd column holds a string that has 4 names, each separated from the other by space (' '), but in some cases there are more than one space separated between the names. I'd like to use a unix-bash command line to print column 1, column 2, name1, name2, name3, name4, name5, all separated by tab.

My desired output would look like this:

avov2323[tab]rogoc232[tab]Roy[tab]Don[tab]Mike[tab]Ned[tab]Lee
cdso3432[tab]fokfd543[tab]Tom[tab]Gil[tab]Rose[tab]Dan[tab]Sam

although - this command line doesn't work for me... as all the names from column 3 get cramped to each other in $a.

Upvotes: 1

Views: 3470

Answers (3)

Krzysztof Jabłoński
Krzysztof Jabłoński

Reputation: 1941

Just for sake of completeness, I also wrote an awk oneliner, which won't touch any spaces in first two columns. It also preserves empty columns:

awk <inputFile -F '\t' 'BEGIN{OFS="\t"} {gsub(/ +/,OFS,$3); print $1,$2,$3}'

Edit: Regarding improvement mentioned in comment - yes, it is possible to split any column, even the middle one, though a more versatile script would be necessary. It's not a oneliner however and looks quite awkward when put in one line. I'm pretty sure it still could be somewhat optimized. With formatting:

BEGIN {
  FS=OFS="\t";
  splitAt=3;
}{
  gsub(/ +/,OFS,$splitAt);
  line=$1;
  for(i=2;i<splitAt;i++)
    line=line""OFS""$i;
  line=line""OFS""$splitAt;
  for(i=splitAt+1;i<=NF;i++)
    line=line""OFS""$i;
  print line;
}

And in charge:

awk <inputFile 'BEGIN{FS=OFS="\t"; splitAt=2;} {gsub(/ +/,OFS,$splitAt); line=$1; for(i=2;i<splitAt;i++) line=line""OFS""$i; line=line""OFS""$splitAt; for(i=splitAt+1;i<=NF;i++) line=line""OFS""$i; print line ;}'

Could be refactored to provide splitAt as a parameter to script.

Upvotes: 1

Krzysztof Jabłoński
Krzysztof Jabłoński

Reputation: 1941

Use tr to translate:

tr <inputFile " " "\t" | tr -s "\t" >outputFile

Edit: As Glenn Jackman pointed out, it would be better to first squeeze spaces, then change remaining spaces to tabs.

tr <inputFile -s " " | tr " " "\t" >outputFile

It's still vulnerable to spaces in first two columns though.

Upvotes: 3

Tom Fenech
Tom Fenech

Reputation: 74615

You could use awk:

$ cat file
avov2323        rogoc232        Roy  Don Mike  Ned Lee
cdso3432        fokfd543        Tom Gil    Rose  Dan Sam
$ awk '{$1=$1}1' OFS='\t' file
avov2323        rogoc232        Roy     Don     Mike    Ned     Lee
cdso3432        fokfd543        Tom     Gil     Rose    Dan     Sam

$1=$1 just touches each record so the new output format is applied. 1 evaluates to true, so each line is printed. Awk treats any number of whitespace characters as the input field separator, so as you can see, the number of spaces between each name is not a problem.

To overwrite the original file, you can use a temporary file:

awk '{$1=$1}1' OFS='\t' file > tmp && mv tmp file

Upvotes: 1

Related Questions