Michael BW
Michael BW

Reputation: 1070

Finding the max number of characters in a CSV file column

I am a Mac Newbie and need to import csv files into a mssql database. Large csv files. I am using the import wizard for the actual import but I am having a hard time determining the longest string for each column.

I have found the following command line which will show the values in the selected column:

cut -d, -f<column number> <path to csv file name>

Now I would like to output the longest length of that column. Props to anyone who could help me return the length for all columns.

Thanks in advance

Upvotes: 0

Views: 1714

Answers (2)

Archemar
Archemar

Reputation: 571

as @jpw sugested

my answer:

why not use awk ?

I don't have a mac to test, but length() is a pretty standard function in awk, so this should work.

awk file:

 { for (i=1;i<=NF;i++) {
    l=length($i) ;
    if ( l > linesize[i] ) linesize[i]=l ;
  }
}
END {
    for (l in linesize) printf "Columen%d: %d\n",l,linesize[l] ;
}

then run

mybox$ awk -F, -f test.awk  a.txt
Columen4: 4
Columen5: 3
Columen1: 6
Columen2: 7
Columen3: 4

To get columns sorted :

BEGIN{ maxi=-1 ; }
 { for (i=1;i<=NF;i++) {
    l=length($i) ;
    if ( l > linesize[i] ) linesize[i]=l ;
        if ( i > maxi ) maxi = i ;
   }
  }
END {
    for (i=1;i<=maxi;i++) printf "Columen%d: %d\n",i,linesize[i] ;
}

gives:

mybox$ awk -F, -f test.awk a.txt
Columen1: 6
Columen2: 7
Columen3: 4
Columen4: 4
Columen5: 3

Upvotes: 4

jpw
jpw

Reputation: 44891

The first part of your question is answered in the post flagged as duplicate, but to apply that for all columns this bash one-line script will give you the max length for all columns (based on the first line of the file):

for((i=1;i< `head -1 test.csv|awk '{print NF}' FS=,`+1 ;i++)); do echo  | xargs echo -n "Column$i: " && cut -d, -f $i test.csv |wc -L  ; done

To explain how it works, the for loop starts from 1 and uses awk to determine the number of columns, and then does the cut -d, -f <column number> with pipe (|) through wc -L to return max length.

The output from my sample file with five columns:

Column1: 6
Column2: 7
Column3: 4
Column4: 4
Column5: 3

You could even put it in a shell-script that takes the filename of the csv file as argument:

#!/bin/bash

for((i=1;i< `head -1 $1|awk '{print NF}' FS=,`+1 ;i++));
do echo  | xargs echo -n "Column$i: "
&& cut -d, -f $i $1 |wc -L  ; done

Save this as something.sh, then do chmod + x something.sh and execute it with ./something.sh your_csv_file.csv

Upvotes: 2

Related Questions