Reputation: 13
I am new to linux and the command line. I am trying to find a command to that will allow me to replace white space
(in a .csv text file) with a semi-colon for all fields except the first. Please see example below, any help would be gratefully received, I have spent a long time looking for a solution. If you do have an answer could you please explain the command so I can try and learn how and why. Many thanks.
Example of input text:
0 k__Bacteria p__Firmicutes c__Bacilli
1 k__Bacteria p__Firmicutes c__Clostridia
2 k__Bacteria p__Bacteroidetes c__Bacteroidia
3 k__Bacteria p__Bacteroidetes c__Bacteroidia
What I need the out put to be:
0 k__Bacteria;p__Firmicutes;c__Bacilli
1 k__Bacteria;p__Firmicutes;c__Clostridia
2 k__Bacteria;p__Bacteroidetes;c__Bacteroidia
3 k__Bacteria;p__Bacteroidetes;c__Bacteroidia
Upvotes: 1
Views: 1380
Reputation: 204259
$ cat file
0 k__Bacteria p__Firmicutes c__Bacilli foo bar
1 k__Bacteria p__Firmicutes c__Clostridia the quick brown
2 k__Bacteria p__Bacteroidetes c__Bacteroidia fox jumped over
3 k__Bacteria p__Bacteroidetes c__Bacteroidia the lazy dogs back
$ awk -v skip=1 '{match($0,"([^[:space:]]+[[:space:]]+){"skip"}"); head=substr($0,1,RSTART+RLENGTH); tail=substr($0,RSTART+RLENGTH+1); gsub(/[[:space:]]+/,";",tail); print head tail}' file
0 k__Bacteria;p__Firmicutes;c__Bacilli;foo;bar
1 k__Bacteria;p__Firmicutes;c__Clostridia;the;quick;brown
2 k__Bacteria;p__Bacteroidetes;c__Bacteroidia;fox;jumped;over
3 k__Bacteria;p__Bacteroidetes;c__Bacteroidia;the;lazy;dogs;back
$ awk -v skip=2 '{match($0,"([^[:space:]]+[[:space:]]+){"skip"}"); head=substr($0,1,RSTART+RLENGTH); tail=substr($0,RSTART+RLENGTH+1); gsub(/[[:space:]]+/,";",tail); print head tail}' file
0 k__Bacteria p__Firmicutes;c__Bacilli;foo;bar
1 k__Bacteria p__Firmicutes;c__Clostridia;the;quick;brown
2 k__Bacteria p__Bacteroidetes;c__Bacteroidia;fox;jumped;over
3 k__Bacteria p__Bacteroidetes;c__Bacteroidia;the;lazy;dogs;back
$ awk -v skip=3 '{match($0,"([^[:space:]]+[[:space:]]+){"skip"}"); head=substr($0,1,RSTART+RLENGTH); tail=substr($0,RSTART+RLENGTH+1); gsub(/[[:space:]]+/,";",tail); print head tail}' file
0 k__Bacteria p__Firmicutes c__Bacilli;foo;bar
1 k__Bacteria p__Firmicutes c__Clostridia;the;quick;brown
2 k__Bacteria p__Bacteroidetes c__Bacteroidia;fox;jumped;over
3 k__Bacteria p__Bacteroidetes c__Bacteroidia;the;lazy;dogs;back
Upvotes: 1
Reputation: 67301
awk -v OFS=";" '{$1=$1" "$2;$2="";gsub(/;;/,";",$0);print}' your_file
or may be in perl:
perl -F -lane 'print join ";",@F' your_file| perl -pe 's/;/ /'
Upvotes: 0
Reputation: 31568
This is the solution is awk
. it can be dirty and someone can refine that but it works
awk 'OFS=";"{a=$1;$1="";$0=a";"$0}sub(/;;/," ",$0) ' temp.txt
Output is
0 k_Bacteria;p_Firmicutes;c_Bacilli
1 k_Bacteria;p_Firmicutes;c_Clostridia
2 k_Bacteria;p_Bacteroidetes;c_Bacteroidia
3 k_Bacteria;p_Bacteroidetes;c_Bacteroidia
cat temp.txt
0 k_Bacteria p_Firmicutes c_Bacilli
1 k_Bacteria p_Firmicutes c_Clostridia
2 k_Bacteria p_Bacteroidetes c_Bacteroidia
3 k_Bacteria p_Bacteroidetes c_Bacteroidia
EDIT: Update as per comments
Try this awk script myawk.sh
BEGIN { print "Begin Processing "}
OFS=";"{
$9=$9"%%"
b = $0;
split($0,a,"%%");
gsub(/;/," ",a[1])
print a[1]a[2]
}
END {print "Process Complete"}
Execute with awk -f myawk.sh temp.txt
where $9 is the variable uptill which u want to keep spaces
Upvotes: 0
Reputation: 574
You could do it in python like this:
#!/usr/bin/env python
import sys
if __name__ == '__main__':
for line in sys.stdin:
cols = line.split()
print ' '.join([cols[0], ';'.join(cols[1:])])
Just chmod +x script
the file and execute it ./script < input
.
Note that line.split() will split by multiple whitespaces, that is 'a b\tc'
will yield in ['a', 'b', 'c']
.
Upvotes: 0