reemas
reemas

Reputation: 11

How to get a line count of all individual files in a directory on AWS S3 using a terminal?

I am new to terminal commands. I know we can do something like wc -l directory/* if the files were local. But how do I achieve the same on AWS S3 using a terminal? The output should be the file name and the count.

For example, there are two files present in a directory in S3 - 'abcd.txt' (5 lines in the file) and 'efgh.txt' (10 lines in the file). I want the line counts of each file without downloading the files, using terminal. Output - 'abcd.txt' 5 'efgh.txt' 10

Upvotes: 0

Views: 2471

Answers (2)

jarmod
jarmod

Reputation: 78553

In case it's helpful, here's a quick shell script that uses the awscli.

#!/bin/bash

FILES=$(aws s3 ls s3://mybucket/csv/ | tr -s ' ' | cut -d ' ' -f4)

for file in $FILES; do
    echo $file, $(aws s3 cp s3://mybucket/csv/$file - | wc -l)
done

Example of output:

planets.csv, 8
countries.csv, 195
continents.csv, 7

Note that it effectively downloads individual files to stdout and then line counts them, so it doesn't persist any file locally. If you want to make it work recursively or against collections of S3 objects that include non-text files then that would be a little additional work.

Upvotes: 1

John Rotenstein
John Rotenstein

Reputation: 269081

It is not possible with a simple command. Amazon S3 does not provide the ability to 'remotely' count the number of lines in an object.

Instead, you would need to download the files to your computer and then count the number of lines.

Upvotes: 1

Related Questions