Reputation: 121
I am given a text file (filelist.txt) which lists all the files on a computer. It looks something like this:
hrw-rw-r-- user1/user1 0 2024-04-29 10:00 home/Desktop/
lrwxrwxrwx user1/user1 0 2024-08-13 10:00 home/Desktop/file0
hrw-rw-r-- user1/user1 0 2024-08-14 11:00 home/Documents/
hrw-rw-r-- user1/user1 0 2024-08-15 11:00 home/Documents/dir1/
-rw-rw-r-- user1/user1 2160 2024-08-16 12:00 home/Documents/dir1/dirA/file1
-rw-rw-r-- user1/user1 2160 2024-08-17 12:00 home/Documents/dir2/
-rw-rw-r-- user1/user1 2160 2024-08-18 11:00 home/Documents/dir2/dirB/file2
-rw-rw-r-- user1/user1 2160 2024-08-19 11:00 home/Documents/dir2/dirB/file3
-rw-rw-r-- user1/user1 2160 2024-08-20 10:00 home/Documents/dir2/dirB/file4
I would like to list directories and files that are at certain depth. Let's say, I would like to list what are right below 'Documents' and the ideal output would be something like below:
home/Documents/dir1/
home/Documents/dir2/
home/Documents/dir3/
or something below is fine too:
hrw-rw-r-- user1/user1 0 2024-08-15 11:00 home/Documents/dir1/
-rw-rw-r-- user1/user1 2160 2024-08-17 12:00 home/Documents/dir2/
-rw-rw-r-- user1/user1 2160 2024-08-17 12:00 home/Documents/dir3/
I have tried with grep like this but nothing came out:
grep 'home\/Documents\/[^\/]\/$ filelist.txt
Any idea? I'm using Mac terminal, but any solution using Mac terminal or python would be appreciated. Thank you in advance!
Upvotes: 3
Views: 80
Reputation: 204446
You could do this using any sed and sort:
$ sed -n 's:.* \(home/Documents/[^/]*/\).*:\1:p' file | sort -u
home/Documents/dir1/
home/Documents/dir2/
but it'll fail (produce undesirable output) if you have input like home/Desktop/ home/Documents/dir/
. I suspect all other answers will fail too given various possible inputs other than those in the sample input in the question.
Regarding the grep command from the question:
grep 'home\/Documents\/[^\/]\/$ filelist.txt
'
for the regexp, which I assume is a typo./
isn't a regexp metacharacter so you shouldn't escape it in a regexp as that turns it into undefined behavior. You need to escape it in some tools which use /
as a regexp delimiter to accommodate that tools syntax (or use a different delimiter), but grep
isn't one of those tools.[^/]
means exactly 1 non-/ character. You want to allow for 1 or more so you'd need to add some kind of repetition instruction, e.g. add +
to the regexp and add the -E
option to enable EREs since +
is an ERE metachar.$
on the end anchors it to only match lines that end with /
but that's not what your input/output show you want so get rid of that $
.Given the above your grep command should have been written:
grep -E 'home/Documents/[^/]+/' filelist.txt
I'm not suggesting that will work, just that it's the right syntax for the command the OP wrote.
Upvotes: 1
Reputation: 21
I add some more lines to file like this
$ cat filelist.txt
hrw-rw-r-- user1/user1 0 2024-04-29 10:00 home/Desktop/
lrwxrwxrwx user1/user1 0 2024-08-13 10:00 home/Desktop/file0
hrw-rw-r-- user1/user1 0 2024-08-14 11:00 home/Documents/
hrw-rw-r-- user1/user1 0 2024-08-15 11:00 home/Documents/dir1/
-rw-rw-r-- user1/user1 2160 2024-08-16 12:00 home/Documents/dir1/dirA/file1
-rw-rw-r-- user1/user1 2160 2024-08-17 12:00 home/Documents/dir2/
-rw-rw-r-- user1/user1 2160 2024-08-18 11:00 home/Documents/dir2/dirB/file2
-rw-rw-r-- user1/user1 2160 2024-08-19 11:00 home/Documents/dir2/dirB/file3
-rw-rw-r-- user1/user1 2160 2024-08-20 10:00 home/Documents/dir2/dirB/file4
-rw-rw-r-- user1/user1 2160 2024-08-20 10:00 home/Documents/file5
-rw-rw-r-- user1/user1 2160 2024-08-20 10:00 home/Documents/dir_X/dir_Y/file_a
-rw-rw-r-- user1/user1 2160 2024-08-20 10:00 home/Documents/dir_X/dir_Y/
-rw-rw-r-- user1/user1 2160 2024-08-20 10:00 home/Documents/dir_X/file_123
-rw-rw-r-- user1/user1 2160 2024-08-20 10:00 home/Documents/dir_X/dir_Y/file_b
-rw-rw-r-- user1/user1 2160 2024-08-20 10:00 home/Documents/dir_X/dir_X/file_c
I then allow search on different strings like this where I have to provide value for name
on command line and I only match on whole words
$ cat file.awk
{
count = split($6, a, "/")
for (i = 1; i <= count; i++) {
if (a[i] == name) {
if (count - i == 1 && a[count] != "") {
print
} else if (count - i == 2 && a[count] == "") {
print
}
}
}
}
I then run with different names
for name in Documents Docu Documents2 dir2 dirB dir_X dir_Y; do
echo "::::::: $name"
awk -f file.awk -v name="$name" filelist.txt
done
::::::: Documents
hrw-rw-r-- user1/user1 0 2024-08-15 11:00 home/Documents/dir1/
-rw-rw-r-- user1/user1 2160 2024-08-17 12:00 home/Documents/dir2/
-rw-rw-r-- user1/user1 2160 2024-08-20 10:00 home/Documents/file5
::::::: Docu
::::::: Documents2
::::::: dir2
::::::: dirB
-rw-rw-r-- user1/user1 2160 2024-08-18 11:00 home/Documents/dir2/dirB/file2
-rw-rw-r-- user1/user1 2160 2024-08-19 11:00 home/Documents/dir2/dirB/file3
-rw-rw-r-- user1/user1 2160 2024-08-20 10:00 home/Documents/dir2/dirB/file4
::::::: dir_X
-rw-rw-r-- user1/user1 2160 2024-08-20 10:00 home/Documents/dir_X/dir_Y/
-rw-rw-r-- user1/user1 2160 2024-08-20 10:00 home/Documents/dir_X/file_123
-rw-rw-r-- user1/user1 2160 2024-08-20 10:00 home/Documents/dir_X/dir_X/file_c
::::::: dir_Y
-rw-rw-r-- user1/user1 2160 2024-08-20 10:00 home/Documents/dir_X/dir_Y/file_a
-rw-rw-r-- user1/user1 2160 2024-08-20 10:00 home/Documents/dir_X/dir_Y/file_b
Upvotes: 2
Reputation: 36725
Regarding your original code
grep 'home\/Documents\/[^\/]\/$ filelist.txt
firstly you are missing ending '
thus it is waiting to be closed, after fixing that
grep 'home\/Documents\/[^\/]\/$' filelist.txt
will give empty output, because [^\/]
denotes single character, you should add *
to express zero-or-more times following way
grep 'home\/Documents\/[^\/]*\/$' filelist.txt
then for filelist.txt
content be
hrw-rw-r-- user1/user1 0 2024-04-29 10:00 home/Desktop/
lrwxrwxrwx user1/user1 0 2024-08-13 10:00 home/Desktop/file0
hrw-rw-r-- user1/user1 0 2024-08-14 11:00 home/Documents/
hrw-rw-r-- user1/user1 0 2024-08-15 11:00 home/Documents/dir1/
-rw-rw-r-- user1/user1 2160 2024-08-16 12:00 home/Documents/dir1/dirA/file1
-rw-rw-r-- user1/user1 2160 2024-08-17 12:00 home/Documents/dir2/
-rw-rw-r-- user1/user1 2160 2024-08-18 11:00 home/Documents/dir2/dirB/file2
-rw-rw-r-- user1/user1 2160 2024-08-19 11:00 home/Documents/dir2/dirB/file3
-rw-rw-r-- user1/user1 2160 2024-08-20 10:00 home/Documents/dir2/dirB/file4
output will be
hrw-rw-r-- user1/user1 0 2024-08-15 11:00 home/Documents/dir1/
-rw-rw-r-- user1/user1 2160 2024-08-17 12:00 home/Documents/dir2/
with paths being colored by default
Upvotes: 2
Reputation: 12528
You can use awk:
$ awk '$6 ~ "^home/Documents/.+" {if (split($6, arr, "/") <= 4) {print $6}}' filelist.txt
home/Documents/dir1/
home/Documents/dir2/
home/Documents/FILE
The last line shows that this method works correctly with files, that is with lines that do not have a terminating '/':
$ tail -n1 filelist.txt
hrw-rw-r-- user1/user1 0 2024-08-15 11:00 home/Documents/FILE
Upvotes: 3