Reputation: 687

awk or sed help for munging first line of a CSV or TSV file

I was messing around with awk because I think it's far simpler to munge the header of a tab delimited or csv file with this tool..

I have two types of files (either comma, or tab delimited) and all I would like to do is to modify the header (NR =1) to:

lowers the case of all the words
replace any spaces with underscores for each field name.. Ex. changing Cancer Type bellow to *cancer_type*

Cancer Type, Assembly Version, Chromosome, Chromosome start, Chromosome end

All I've managed to do so far is to list the first line

awk 'NR == 1' test2.csv

Well I'm at a loss. In any case I'll probably run this script (sed or awk) prior to doing some downstream modifications.

Any help (or pointing me to a good tutorial/one liners) would be much appreciated.

EDIT

Hi I should edit to clarify this. I will be taking starting with a file, and ending with the same file but with the header changed.

I could get two versions of the file.

The CSV

Cancer Type, Assembly Version, Chromosome, Chromosome start, Chromosome end

After:

cancer_type, assembly_version, chromosome, chromosome_start, chromosome_end

The TSV

Cancer Type\t Assembly Version\t Chromosome\t Chromosome start\t Chromosome end

After:

cancer_type\t assembly_version\t chromosome\t chromosome_start\t chromosome_end

Having said that I think approaches are almost working..

EDIT 2 The os is OS X 10.7.+

Upvotes: 3

Answers (5)

prussiap

Reputation: 687

Hey guys both commands worked but for OS X you have to

brew install gnu-sed

then run your sed command

gsed -i '1{s/\b \b/_/g;s/[[:upper:]]/\L&/g;}' infile

magic.. thanks guys.

Upvotes: 0

TrueY

Reputation: 7610

If I understood well OP wants to replace the header of the original file, not just print out the result to the console.

At first I tried to solve it with awk, as I know it better. But awk has not inplace editing feature, so some bash workaround is needed:

# Unsafe hack
#{ rm infile; awk 'NR==1{...}1' >infile;} <infile
#Ed Morton's correction
awk 'NR==1{...}1' infile >tmp && mv tmp infile

This works, but it uses 1 extra fork for the rm command. It would be better to use inplace editing. sed or perl supports this feature. To use perl is a little bit overkill, so I corrected a little bit captha's sed solution:

sed -i '1{s/\b \b/_/g;s/[[:upper:]]/\L&/g;}' infile

The infile before:

Cancer Type, Assembly Version, Chromosome, Chromosome start, Chromosome end
One 1,Two 2

The infile after:

cancer_type, assembly_version, chromosome, chromosome_start, chromosome_end
One 1,Two 2

Upvotes: 2

captcha

Reputation: 3756

Code for GNU sed

sed -r '1 {s/.*/\L&/;s/\b\s\b/_/g}' infile>outfile

$ echo Cancer Type, Assembly Version, Chromosome, Chromosome start, Chromosome end|sed -r '1 {s/.*/\L&/;s/\b\s\b/_/g}'
cancer_type, assembly_version, chromosome, chromosome_start, chromosome_end

Upvotes: 4

sjngm

Reputation: 12861

Maybe I don't fully understand your question, but as far as I understood this should solve it:

head -1 test2.csv | sed -e 's/\(.*\)/\L\1/' -e 's/ /_/g' > tmp.txt
tail -n +2 test2.csv >> tmp.txt

head picks the first line
the first sed option makes everything lower-case
the second sed option converts all spaces to underscores
tail prints everything starting at line 2

tmp.txt now contains the complete result.

Upvotes: 0

jaypal singh

Reputation: 77085

If you want to modify only the header and print the remaining lines as is then try something like this with GNU awk:

awk 'BEGIN{FS=OFS=","}NR==1{$0=tolower($0);gsub(/\y \y/,"_",$0)}1' csv

Upvotes: 2

awk or sed help for munging first line of a CSV or TSV file

Answers (5)

Related Questions