Reputation: 4419
I have multiple large comma separated CSV files in a directory. But, as a toy example:
one.csv
has 3 rows, 2 columns
two.csv
has 4 rows 5 columns
This is what the files look like -
# one.csv
a b
1 1 3
2 2 2
3 3 1
# two.csv
c d e f g
1 4 1 1 4 1
2 3 2 2 3 2
3 2 3 3 2 3
4 1 4 4 1 4
The goal is to make a new .txt
or .csv
that gives the rows and columns for each:
one 3 2
two 4 5
To get the rows and columns (and dump it into a file) for a single file
$ awk -F "," '{print NF}' *.csv | sort | uniq -c > dims.txt
But I'm not understanding the syntax to get counts for multiple files.
$ awk '{for (i=1; i<=2; i++) -F "," '{print NF}' *.csv$i | sort | uniq -c}'
Upvotes: 3
Views: 841
Reputation: 104032
Without GNU awk you can use the shell plus POSIX awk this way:
for fn in *.csv; do
cols=$(awk '{print NF; exit}' "$fn")
rows=$(awk 'END{print NR-1}' "$fn")
printf "%s %s %s\n" "${fn%.csv}" "$rows" "$cols"
done
Prints:
one 3 2
two 4 5
Upvotes: 1
Reputation: 133680
With any awk
, you could try following awk
program.
awk '
FNR==1{
if(cols && rows){
print file,rows,cols
}
rows=cols=file=""
file=FILENAME
sub(/\..*/,"",file)
cols=NF
next
}
{
rows=(FNR-1)
}
END{
if(cols && rows){
print file,rows,cols
}
}
' one.csv two.csv
Explanation: Adding detailed explanation for above solution.
awk ' ##Starting awk program from here.
FNR==1{ ##Checking condition if this is first line of each line then do following.
if(cols && rows){ ##Checking if cols AND rows are NOT NULL then do following.
print file,rows,cols ##Printing file, rows and cols variables here.
}
rows=cols=file="" ##Nullifying rows, cols and file here.
file=FILENAME ##Setting FILENAME value to file here.
sub(/\..*/,"",file) ##Removing everything from dot to till end of value in file.
cols=NF ##Setting NF values to cols here.
next ##next will skip all further statements from here.
}
{
rows=(FNR-1) ##Setting FNR-1 value to rows here.
}
END{ ##Starting END block of this program from here.
if(cols && rows){ ##Checking if cols AND rows are NOT NULL then do following.
print file,rows,cols ##Printing file, rows and cols variables here.
}
}
' one.csv two.csv ##Mentioning Input_file names here.
Upvotes: 3
Reputation: 785691
Using gnu awk you can do this in a single awk
:
awk -F, 'ENDFILE {
print gensub(/\.[^.]+$/, "", "1", FILENAME), FNR-1, NF-1
}' one.csv two.csv > dims.txt
cat dims.txt
one 3 2
two 4 5
Upvotes: 2
Reputation: 36660
I would harness GNU AWK
's ENDFILE
for this task as follows, let content of one.csv
be
1,3
2,2
3,1
and two.csv
be
4,1,1,4,1
3,2,2,3,2
2,3,3,2,3
1,4,4,1,4
then
awk 'BEGIN{FS=","}ENDFILE{print FILENAME, FNR, NF}' one.csv two.csv
output
one.csv 3 2
two.csv 4 5
Explanation: ENDFILE
is executed after processing every file, I set FS
to ,
assuming that fields are ,
-separated and there is not ,
inside filed, FILENAME
, FNR
, NF
are built-in GNU AWK
variables: FNR
is number of current row in file, i.e. in ENDFILE
number of last row, NF
is number of fileds (again of last row). If you have files with headers use FNR-1
, if you have rows prepended with row number use NF-1
.
edit: changed NR to FNR
Upvotes: 1
Reputation: 6075
You will need to iterate over all CSVs print the name for each file and the dimensions
for i in *.csv; do awk -F "," 'END{print FILENAME, NR, NF}' $i; done > dims.txt
If you want to avoid awk
you can also do it wc -l
for lines and grep -o "CSV-separator" | wc -l
for fields
Upvotes: 1