Donald Seinen
Donald Seinen

Reputation: 4419

Count rows and columns for multiple CSV files and make new file

I have multiple large comma separated CSV files in a directory. But, as a toy example:

This is what the files look like -

# one.csv
  a b 
1 1 3
2 2 2
3 3 1

# two.csv
  c d e f g
1 4 1 1 4 1
2 3 2 2 3 2
3 2 3 3 2 3
4 1 4 4 1 4

The goal is to make a new .txt or .csv that gives the rows and columns for each:

one 3 2  
two 4 5

To get the rows and columns (and dump it into a file) for a single file

$ awk -F "," '{print NF}' *.csv | sort | uniq -c > dims.txt

But I'm not understanding the syntax to get counts for multiple files.

What I've tried
$ awk '{for (i=1; i<=2; i++) -F "," '{print NF}' *.csv$i | sort | uniq -c}'

Upvotes: 3

Views: 841

Answers (5)

dawg
dawg

Reputation: 104032

Without GNU awk you can use the shell plus POSIX awk this way:

for fn in *.csv; do
    cols=$(awk '{print NF; exit}' "$fn")
    rows=$(awk 'END{print NR-1}' "$fn")
    printf "%s %s %s\n" "${fn%.csv}" "$rows" "$cols" 
done

Prints:

one 3 2
two 4 5

Upvotes: 1

RavinderSingh13
RavinderSingh13

Reputation: 133680

With any awk, you could try following awk program.

awk '
FNR==1{
  if(cols && rows){
    print file,rows,cols
  }
  rows=cols=file=""
  file=FILENAME
  sub(/\..*/,"",file)
  cols=NF
  next
}
{
  rows=(FNR-1)
}
END{
  if(cols && rows){
    print file,rows,cols
  }
}
' one.csv two.csv

Explanation: Adding detailed explanation for above solution.

awk '                       ##Starting awk program from here.
FNR==1{                     ##Checking condition if this is first line of each line then do following.
  if(cols && rows){         ##Checking if cols AND rows are NOT NULL then do following.
    print file,rows,cols    ##Printing file, rows and cols variables here.
  }
  rows=cols=file=""         ##Nullifying rows, cols and file here.
  file=FILENAME             ##Setting FILENAME value to file here.
  sub(/\..*/,"",file)       ##Removing everything from dot to till end of value in file.
  cols=NF                   ##Setting NF values to cols here.
  next                      ##next will skip all further statements from here.
}
{
  rows=(FNR-1)              ##Setting FNR-1 value to rows here.
}
END{                        ##Starting END block of this program from here.
  if(cols && rows){         ##Checking if cols AND rows are NOT NULL then do following.
    print file,rows,cols    ##Printing file, rows and cols variables here.
  }
}
' one.csv two.csv           ##Mentioning Input_file names here.

Upvotes: 3

anubhava
anubhava

Reputation: 785691

Using gnu awk you can do this in a single awk:

awk -F, 'ENDFILE {
   print gensub(/\.[^.]+$/, "", "1", FILENAME), FNR-1, NF-1
}' one.csv two.csv > dims.txt

cat dims.txt

one 3 2
two 4 5

Upvotes: 2

Daweo
Daweo

Reputation: 36660

I would harness GNU AWK's ENDFILE for this task as follows, let content of one.csv be

1,3
2,2
3,1

and two.csv be

4,1,1,4,1
3,2,2,3,2
2,3,3,2,3
1,4,4,1,4

then

awk 'BEGIN{FS=","}ENDFILE{print FILENAME, FNR, NF}' one.csv two.csv

output

one.csv 3 2
two.csv 4 5

Explanation: ENDFILE is executed after processing every file, I set FS to , assuming that fields are ,-separated and there is not , inside filed, FILENAME, FNR, NF are built-in GNU AWK variables: FNR is number of current row in file, i.e. in ENDFILE number of last row, NF is number of fileds (again of last row). If you have files with headers use FNR-1, if you have rows prepended with row number use NF-1.

edit: changed NR to FNR

Upvotes: 1

Liviu Stirb
Liviu Stirb

Reputation: 6075

You will need to iterate over all CSVs print the name for each file and the dimensions

for i in *.csv; do awk -F "," 'END{print FILENAME, NR, NF}' $i; done > dims.txt

If you want to avoid awk you can also do it wc -l for lines and grep -o "CSV-separator" | wc -l for fields

Upvotes: 1

Related Questions