Reputation: 1

deleting spaces between every other column

I have a large dataset that looks like this:

ID224912 A A A B B A B A B A B

and I want to make it look like:

ID224912 AA AB BA BA BA BA

I have tried modifying this code that I found somewhere else but no success:

AWK=''' { printf (""%s %s %s %s"", $1, $2, $3, $4); } 
{ for (f = 5; f <= NF; f += 2) printf (""%s %s"", $(f), $(f + 1)); } 
{ printf (""\n""); } ''' 
awk ""${AWK}"" InFile > OutFile

Any suggestions?

Upvotes: 0

Answers (6)

kvantour

Reputation: 26471

The following awk line

awk '{printf $1}{for(i=2;i<=NF;i+=2) printf OFS $i $(i+1); print "" }'

will output

ID224912 AA AB BA BA BA B

As you notice, we have an extra column B in the end due to the even amount of columns in the original output. As the OP does not want this, we can fix this with a simple update in the for-loop condititions

awk '{printf $1}{for(i=2;i<NF;i+=2) printf OFS $i $(i+1); print "" }'

will output

ID224912 AA AB BA BA BA

Upvotes: 0

potong

Reputation: 58361

This might work for you (GNU sed):

sed -E 's/((\S+\s\S+\s)*\S+).*/\1/g;s/(\S+\s\S+)\s/\1/g' file

The solution is in two parts. First group the spaces between fields to be an even number and delete an extra field if there is one. Then group the fields

Upvotes: 1

Bsquare ℬℬ

Reputation: 4487

Regarding InFile as your input file, you can use sed this way:

cat InFile |sed -e 's/\([a-zA-Z]\)[ \t]\([a-zA-Z]\)/\1\2/g'

N.B.: with the specified InFile in your initial question (with an odd count of letters), the result is:

ID224912 AA AB BA BA BA B

Upvotes: 0

nsa

Reputation: 565

For funsies here is a sed solution:

cat input | sed 's/\([ A-Z ]\) \([ A-Z ]\)/\1\2/g' > output

Just for clarification I tested on BSD sed.

Upvotes: 0

tshiono

Reputation: 22012

You do not have to assign the AWK script into a variable. Just invoke it inline, which is simpler and safer.
It looks strange that you are grouping the first four fields. As far as I can see from your desired output, it would be enough just to treat the first (ID) field separately.

Try something like:

awk '{printf("%s", $1); for (i=2; i<=NF; i+=2) printf(" %s%s", $i, $(i+1)); print ""}' InFile > OutFile

Hope this hepls.

Upvotes: 0

Ed Morton

Reputation: 203169

$ awk '{r=$1; for (i=2; i<NF; i+=2) r=r OFS $i $(i+1); print r}' file
ID224912 AA AB BA BA BA

Upvotes: 0

deleting spaces between every other column

Answers (6)

Related Questions