Reputation: 1
I have a large dataset that looks like this:
ID224912 A A A B B A B A B A B
and I want to make it look like:
ID224912 AA AB BA BA BA BA
I have tried modifying this code that I found somewhere else but no success:
AWK=''' { printf (""%s %s %s %s"", $1, $2, $3, $4); }
{ for (f = 5; f <= NF; f += 2) printf (""%s %s"", $(f), $(f + 1)); }
{ printf (""\n""); } '''
awk ""${AWK}"" InFile > OutFile
Any suggestions?
Upvotes: 0
Views: 66
Reputation: 26471
The following awk line
awk '{printf $1}{for(i=2;i<=NF;i+=2) printf OFS $i $(i+1); print "" }'
will output
ID224912 AA AB BA BA BA B
As you notice, we have an extra column B
in the end due to the even amount of columns in the original output. As the OP does not want this, we can fix this with a simple update in the for-loop condititions
awk '{printf $1}{for(i=2;i<NF;i+=2) printf OFS $i $(i+1); print "" }'
will output
ID224912 AA AB BA BA BA
Upvotes: 0
Reputation: 58361
This might work for you (GNU sed):
sed -E 's/((\S+\s\S+\s)*\S+).*/\1/g;s/(\S+\s\S+)\s/\1/g' file
The solution is in two parts. First group the spaces between fields to be an even number and delete an extra field if there is one. Then group the fields
Upvotes: 1
Reputation: 4487
Regarding InFile as your input file, you can use sed this way:
cat InFile |sed -e 's/\([a-zA-Z]\)[ \t]\([a-zA-Z]\)/\1\2/g'
N.B.: with the specified InFile in your initial question (with an odd count of letters), the result is:
ID224912 AA AB BA BA BA B
Upvotes: 0
Reputation: 565
For funsies here is a sed solution:
cat input | sed 's/\([ A-Z ]\) \([ A-Z ]\)/\1\2/g' > output
Just for clarification I tested on BSD sed.
Upvotes: 0
Reputation: 22012
Try something like:
awk '{printf("%s", $1); for (i=2; i<=NF; i+=2) printf(" %s%s", $i, $(i+1)); print ""}' InFile > OutFile
Hope this hepls.
Upvotes: 0
Reputation: 203169
$ awk '{r=$1; for (i=2; i<NF; i+=2) r=r OFS $i $(i+1); print r}' file
ID224912 AA AB BA BA BA
Upvotes: 0