josuec
josuec

Reputation: 1

deleting spaces between every other column

I have a large dataset that looks like this:

ID224912 A A A B B A B A B A B

and I want to make it look like:

ID224912 AA AB BA BA BA BA

I have tried modifying this code that I found somewhere else but no success:

AWK=''' { printf (""%s %s %s %s"", $1, $2, $3, $4); } 
{ for (f = 5; f <= NF; f += 2) printf (""%s %s"", $(f), $(f + 1)); } 
{ printf (""\n""); } ''' 
awk ""${AWK}"" InFile > OutFile

Any suggestions?

Upvotes: 0

Views: 66

Answers (6)

kvantour
kvantour

Reputation: 26471

The following line

awk '{printf $1}{for(i=2;i<=NF;i+=2) printf OFS $i $(i+1); print "" }'

will output

ID224912 AA AB BA BA BA B

As you notice, we have an extra column B in the end due to the even amount of columns in the original output. As the OP does not want this, we can fix this with a simple update in the for-loop condititions

awk '{printf $1}{for(i=2;i<NF;i+=2) printf OFS $i $(i+1); print "" }'

will output

ID224912 AA AB BA BA BA

Upvotes: 0

potong
potong

Reputation: 58361

This might work for you (GNU sed):

sed -E 's/((\S+\s\S+\s)*\S+).*/\1/g;s/(\S+\s\S+)\s/\1/g' file

The solution is in two parts. First group the spaces between fields to be an even number and delete an extra field if there is one. Then group the fields

Upvotes: 1

Bsquare ℬℬ
Bsquare ℬℬ

Reputation: 4487

Regarding InFile as your input file, you can use sed this way:

cat InFile |sed -e 's/\([a-zA-Z]\)[ \t]\([a-zA-Z]\)/\1\2/g'

N.B.: with the specified InFile in your initial question (with an odd count of letters), the result is:

ID224912 AA AB BA BA BA B

Upvotes: 0

nsa
nsa

Reputation: 565

For funsies here is a sed solution:

cat input | sed 's/\([ A-Z ]\) \([ A-Z ]\)/\1\2/g' > output

Just for clarification I tested on BSD sed.

Upvotes: 0

tshiono
tshiono

Reputation: 22012

  • You do not have to assign the AWK script into a variable. Just invoke it inline, which is simpler and safer.
  • It looks strange that you are grouping the first four fields. As far as I can see from your desired output, it would be enough just to treat the first (ID) field separately.

Try something like:

awk '{printf("%s", $1); for (i=2; i<=NF; i+=2) printf(" %s%s", $i, $(i+1)); print ""}' InFile > OutFile

Hope this hepls.

Upvotes: 0

Ed Morton
Ed Morton

Reputation: 203169

$ awk '{r=$1; for (i=2; i<NF; i+=2) r=r OFS $i $(i+1); print r}' file
ID224912 AA AB BA BA BA

Upvotes: 0

Related Questions