Rachel
Rachel

Reputation: 121

List directories and files at specific depth, when file paths are given in a text file

I am given a text file (filelist.txt) which lists all the files on a computer. It looks something like this:

hrw-rw-r-- user1/user1       0 2024-04-29 10:00 home/Desktop/
lrwxrwxrwx user1/user1       0 2024-08-13 10:00 home/Desktop/file0
hrw-rw-r-- user1/user1       0 2024-08-14 11:00 home/Documents/
hrw-rw-r-- user1/user1       0 2024-08-15 11:00 home/Documents/dir1/
-rw-rw-r-- user1/user1    2160 2024-08-16 12:00 home/Documents/dir1/dirA/file1
-rw-rw-r-- user1/user1    2160 2024-08-17 12:00 home/Documents/dir2/
-rw-rw-r-- user1/user1    2160 2024-08-18 11:00 home/Documents/dir2/dirB/file2
-rw-rw-r-- user1/user1    2160 2024-08-19 11:00 home/Documents/dir2/dirB/file3
-rw-rw-r-- user1/user1    2160 2024-08-20 10:00 home/Documents/dir2/dirB/file4

I would like to list directories and files that are at certain depth. Let's say, I would like to list what are right below 'Documents' and the ideal output would be something like below:

home/Documents/dir1/
home/Documents/dir2/
home/Documents/dir3/

or something below is fine too:

hrw-rw-r-- user1/user1       0 2024-08-15 11:00 home/Documents/dir1/
-rw-rw-r-- user1/user1    2160 2024-08-17 12:00 home/Documents/dir2/
-rw-rw-r-- user1/user1    2160 2024-08-17 12:00 home/Documents/dir3/

I have tried with grep like this but nothing came out:

grep 'home\/Documents\/[^\/]\/$ filelist.txt

Any idea? I'm using Mac terminal, but any solution using Mac terminal or python would be appreciated. Thank you in advance!

Upvotes: 3

Views: 80

Answers (4)

Ed Morton
Ed Morton

Reputation: 204446

You could do this using any sed and sort:

$ sed -n 's:.* \(home/Documents/[^/]*/\).*:\1:p' file | sort -u
home/Documents/dir1/
home/Documents/dir2/

but it'll fail (produce undesirable output) if you have input like home/Desktop/ home/Documents/dir/. I suspect all other answers will fail too given various possible inputs other than those in the sample input in the question.

Regarding the grep command from the question:

grep 'home\/Documents\/[^\/]\/$ filelist.txt

  1. It's missing the terminating ' for the regexp, which I assume is a typo.
  2. / isn't a regexp metacharacter so you shouldn't escape it in a regexp as that turns it into undefined behavior. You need to escape it in some tools which use / as a regexp delimiter to accommodate that tools syntax (or use a different delimiter), but grep isn't one of those tools.
  3. [^/] means exactly 1 non-/ character. You want to allow for 1 or more so you'd need to add some kind of repetition instruction, e.g. add + to the regexp and add the -E option to enable EREs since + is an ERE metachar.
  4. The $ on the end anchors it to only match lines that end with / but that's not what your input/output show you want so get rid of that $.

Given the above your grep command should have been written:

grep -E 'home/Documents/[^/]+/' filelist.txt

I'm not suggesting that will work, just that it's the right syntax for the command the OP wrote.

Upvotes: 1

user28909846
user28909846

Reputation: 21

I add some more lines to file like this

$ cat filelist.txt
hrw-rw-r-- user1/user1       0 2024-04-29 10:00 home/Desktop/
lrwxrwxrwx user1/user1       0 2024-08-13 10:00 home/Desktop/file0
hrw-rw-r-- user1/user1       0 2024-08-14 11:00 home/Documents/
hrw-rw-r-- user1/user1       0 2024-08-15 11:00 home/Documents/dir1/
-rw-rw-r-- user1/user1    2160 2024-08-16 12:00 home/Documents/dir1/dirA/file1
-rw-rw-r-- user1/user1    2160 2024-08-17 12:00 home/Documents/dir2/
-rw-rw-r-- user1/user1    2160 2024-08-18 11:00 home/Documents/dir2/dirB/file2
-rw-rw-r-- user1/user1    2160 2024-08-19 11:00 home/Documents/dir2/dirB/file3
-rw-rw-r-- user1/user1    2160 2024-08-20 10:00 home/Documents/dir2/dirB/file4
-rw-rw-r-- user1/user1    2160 2024-08-20 10:00 home/Documents/file5
-rw-rw-r-- user1/user1    2160 2024-08-20 10:00 home/Documents/dir_X/dir_Y/file_a
-rw-rw-r-- user1/user1    2160 2024-08-20 10:00 home/Documents/dir_X/dir_Y/
-rw-rw-r-- user1/user1    2160 2024-08-20 10:00 home/Documents/dir_X/file_123
-rw-rw-r-- user1/user1    2160 2024-08-20 10:00 home/Documents/dir_X/dir_Y/file_b
-rw-rw-r-- user1/user1    2160 2024-08-20 10:00 home/Documents/dir_X/dir_X/file_c

I then allow search on different strings like this where I have to provide value for name on command line and I only match on whole words

$ cat file.awk
{
        count = split($6, a, "/")
        for (i = 1; i <= count; i++) {
                if (a[i] == name) {
                        if (count - i == 1 && a[count] != "") {
                                print
                        } else if (count - i == 2 && a[count] == "") {
                                print
                        }
                }
        }
}

I then run with different names

for name in Documents Docu Documents2 dir2 dirB dir_X dir_Y; do
    echo "::::::: $name"
    awk -f file.awk -v name="$name" filelist.txt
done
::::::: Documents
hrw-rw-r-- user1/user1       0 2024-08-15 11:00 home/Documents/dir1/
-rw-rw-r-- user1/user1    2160 2024-08-17 12:00 home/Documents/dir2/
-rw-rw-r-- user1/user1    2160 2024-08-20 10:00 home/Documents/file5
::::::: Docu
::::::: Documents2
::::::: dir2
::::::: dirB
-rw-rw-r-- user1/user1    2160 2024-08-18 11:00 home/Documents/dir2/dirB/file2
-rw-rw-r-- user1/user1    2160 2024-08-19 11:00 home/Documents/dir2/dirB/file3
-rw-rw-r-- user1/user1    2160 2024-08-20 10:00 home/Documents/dir2/dirB/file4
::::::: dir_X
-rw-rw-r-- user1/user1    2160 2024-08-20 10:00 home/Documents/dir_X/dir_Y/
-rw-rw-r-- user1/user1    2160 2024-08-20 10:00 home/Documents/dir_X/file_123
-rw-rw-r-- user1/user1    2160 2024-08-20 10:00 home/Documents/dir_X/dir_X/file_c
::::::: dir_Y
-rw-rw-r-- user1/user1    2160 2024-08-20 10:00 home/Documents/dir_X/dir_Y/file_a
-rw-rw-r-- user1/user1    2160 2024-08-20 10:00 home/Documents/dir_X/dir_Y/file_b

Upvotes: 2

Daweo
Daweo

Reputation: 36725

Regarding your original code

grep 'home\/Documents\/[^\/]\/$ filelist.txt

firstly you are missing ending ' thus it is waiting to be closed, after fixing that

grep 'home\/Documents\/[^\/]\/$' filelist.txt

will give empty output, because [^\/] denotes single character, you should add * to express zero-or-more times following way

grep 'home\/Documents\/[^\/]*\/$' filelist.txt

then for filelist.txt content be

hrw-rw-r-- user1/user1       0 2024-04-29 10:00 home/Desktop/
lrwxrwxrwx user1/user1       0 2024-08-13 10:00 home/Desktop/file0
hrw-rw-r-- user1/user1       0 2024-08-14 11:00 home/Documents/
hrw-rw-r-- user1/user1       0 2024-08-15 11:00 home/Documents/dir1/
-rw-rw-r-- user1/user1    2160 2024-08-16 12:00 home/Documents/dir1/dirA/file1
-rw-rw-r-- user1/user1    2160 2024-08-17 12:00 home/Documents/dir2/
-rw-rw-r-- user1/user1    2160 2024-08-18 11:00 home/Documents/dir2/dirB/file2
-rw-rw-r-- user1/user1    2160 2024-08-19 11:00 home/Documents/dir2/dirB/file3
-rw-rw-r-- user1/user1    2160 2024-08-20 10:00 home/Documents/dir2/dirB/file4

output will be

hrw-rw-r-- user1/user1       0 2024-08-15 11:00 home/Documents/dir1/
-rw-rw-r-- user1/user1    2160 2024-08-17 12:00 home/Documents/dir2/

with paths being colored by default

Upvotes: 2

Arkadiusz Drabczyk
Arkadiusz Drabczyk

Reputation: 12528

You can use awk:

$ awk '$6 ~ "^home/Documents/.+" {if (split($6, arr, "/") <= 4) {print $6}}' filelist.txt
home/Documents/dir1/
home/Documents/dir2/
home/Documents/FILE

The last line shows that this method works correctly with files, that is with lines that do not have a terminating '/':

$ tail -n1 filelist.txt
hrw-rw-r-- user1/user1       0 2024-08-15 11:00 home/Documents/FILE

Upvotes: 3

Related Questions