Mihail
Mihail

Reputation: 25

Search for part of string with grep in all files in folder and subfolders

I have .html files in directories and subdirectories. I need to extract all strings that starts with "domain.com". Part of string can look like this:

["https://example.com/folder1",
href="https://example.com/anotherfolder2" target="
etc.

What I want to extract is: folder1
anotherfolder2
etc.

from all files in all folders to one list, each word - new line.

Found some examples on StackOverflow with many likes, but not worked. I tried like this (from some examples):

grep -Po '(?<=example.com=)[^,]*'

Thank you for help!

Upvotes: 2

Views: 1043

Answers (2)

petrus4
petrus4

Reputation: 614

echo "https://example.com/folder1" | tr -s '/' | tr '/' '\n' > file
sed -i '1d' file
sed -n '1p' file # This will give you example.com
sed -n '2p' file # This will give you folder1

sed -i 1s'@example\[email protected]@' file
echo "http://" > nf
sed -n '2,$p' file >> nf
cat nf | tr '\n' '/' > newfile
cat newfile # This should be http://newsite.com/folder1
rm -v ./nf

Upvotes: 0

ramsay
ramsay

Reputation: 3845

grep "example.com" your-directory -r | grep -o '".*"' | cut -d \" -f2| sed -e 's/https:\/\/example.com\///g'
  1. grep "example.com" your-directory -r | grep -o '".*"' your-directory -r | cut -d \" -f2 extracts the content of quoted string
  2. sed -e 's/https:\/\/example.com\///g' get the suffix of https://example.com/

Upvotes: 1

Related Questions