Reputation: 687
I was messing around with awk because I think it's far simpler to munge the header of a tab delimited or csv file with this tool..
I have two types of files (either comma, or tab delimited) and all I would like to do is to modify the header (NR =1) to:
Cancer Type, Assembly Version, Chromosome, Chromosome start, Chromosome end
All I've managed to do so far is to list the first line
awk 'NR == 1' test2.csv
Well I'm at a loss. In any case I'll probably run this script (sed or awk) prior to doing some downstream modifications.
Any help (or pointing me to a good tutorial/one liners) would be much appreciated.
EDIT
Hi I should edit to clarify this. I will be taking starting with a file, and ending with the same file but with the header changed.
I could get two versions of the file.
The CSV
Cancer Type, Assembly Version, Chromosome, Chromosome start, Chromosome end
After:
cancer_type, assembly_version, chromosome, chromosome_start, chromosome_end
The TSV
Cancer Type\t Assembly Version\t Chromosome\t Chromosome start\t Chromosome end
After:
cancer_type\t assembly_version\t chromosome\t chromosome_start\t chromosome_end
Having said that I think approaches are almost working..
EDIT 2 The os is OS X 10.7.+
Upvotes: 3
Views: 906
Reputation: 687
Hey guys both commands worked but for OS X you have to
brew install gnu-sed
then run your sed command
gsed -i '1{s/\b \b/_/g;s/[[:upper:]]/\L&/g;}' infile
magic.. thanks guys.
Upvotes: 0
Reputation: 7610
If I understood well OP wants to replace the header of the original file, not just print out the result to the console.
At first I tried to solve it with awk, as I know it better. But awk has not inplace editing feature, so some bash workaround is needed:
# Unsafe hack
#{ rm infile; awk 'NR==1{...}1' >infile;} <infile
#Ed Morton's correction
awk 'NR==1{...}1' infile >tmp && mv tmp infile
This works, but it uses 1 extra fork
for the rm
command. It would be better to use inplace editing. sed or perl supports this feature. To use perl is a little bit overkill, so I corrected a little bit captha's sed solution:
sed -i '1{s/\b \b/_/g;s/[[:upper:]]/\L&/g;}' infile
The infile before:
Cancer Type, Assembly Version, Chromosome, Chromosome start, Chromosome end
One 1,Two 2
The infile after:
cancer_type, assembly_version, chromosome, chromosome_start, chromosome_end
One 1,Two 2
Upvotes: 2
Reputation: 3756
Code for GNU sed
sed -r '1 {s/.*/\L&/;s/\b\s\b/_/g}' infile>outfile
$ echo Cancer Type, Assembly Version, Chromosome, Chromosome start, Chromosome end|sed -r '1 {s/.*/\L&/;s/\b\s\b/_/g}' cancer_type, assembly_version, chromosome, chromosome_start, chromosome_end
Upvotes: 4
Reputation: 12861
Maybe I don't fully understand your question, but as far as I understood this should solve it:
head -1 test2.csv | sed -e 's/\(.*\)/\L\1/' -e 's/ /_/g' > tmp.txt
tail -n +2 test2.csv >> tmp.txt
head
picks the first linesed
option makes everything lower-casesed
option converts all spaces to underscorestail
prints everything starting at line 2tmp.txt
now contains the complete result.
Upvotes: 0
Reputation: 77085
If you want to modify only the header and print the remaining lines as is then try something like this with GNU awk
:
awk 'BEGIN{FS=OFS=","}NR==1{$0=tolower($0);gsub(/\y \y/,"_",$0)}1' csv
Upvotes: 2