Reputation: 11
I am new to terminal commands. I know we can do something like wc -l directory/*
if the files were local.
But how do I achieve the same on AWS S3 using a terminal?
The output should be the file name and the count.
For example, there are two files present in a directory in S3 - 'abcd.txt' (5 lines in the file) and 'efgh.txt' (10 lines in the file). I want the line counts of each file without downloading the files, using terminal. Output - 'abcd.txt' 5 'efgh.txt' 10
Upvotes: 0
Views: 2471
Reputation: 78553
In case it's helpful, here's a quick shell script that uses the awscli.
#!/bin/bash
FILES=$(aws s3 ls s3://mybucket/csv/ | tr -s ' ' | cut -d ' ' -f4)
for file in $FILES; do
echo $file, $(aws s3 cp s3://mybucket/csv/$file - | wc -l)
done
Example of output:
planets.csv, 8
countries.csv, 195
continents.csv, 7
Note that it effectively downloads individual files to stdout and then line counts them, so it doesn't persist any file locally. If you want to make it work recursively or against collections of S3 objects that include non-text files then that would be a little additional work.
Upvotes: 1
Reputation: 269081
It is not possible with a simple command. Amazon S3 does not provide the ability to 'remotely' count the number of lines in an object.
Instead, you would need to download the files to your computer and then count the number of lines.
Upvotes: 1