M. Beausoleil
M. Beausoleil

Reputation: 3555

Using Sed after grep to replace inline with an HTML prefix

I have some text in which I want to replace with an actual link.

The text looks like this:

Some text here 
[...]
-   CRAN Task View: [Bayesian](Bayesian.html)
-   CRAN Task View: [Cluster](Cluster.html)
-   CRAN Task View: [Databases](Databases.html)
-   CRAN Task View: [Environmetrics](Environmetrics.html)
[...]
End of text here

But as you can see, there is no HTML link to the pages. E.g., Bayesian.html should be http://cran.rstudio.com/web/views/Bayesian.html

The final result should be

Some text here 
[...]
-   CRAN Task View: [Bayesian](http://cran.rstudio.com/web/views/Bayesian.html)
-   CRAN Task View: [Cluster](http://cran.rstudio.com/web/views/Cluster.html)
-   CRAN Task View: [Databases](http://cran.rstudio.com/web/views/Databases.html)
-   CRAN Task View: [Environmetrics](http://cran.rstudio.com/web/views/Environmetrics.html)
[...]
End of text here

So far, I was able to "subset" my text file using the following command:

grep "CRAN Task View: \[" $FILE

But when I try to pipe to this:

sed -e 's|\\([a-zA-Z]*\\)\\.html|http://cran.rstudio.com/web/views/\\1.html|'

It doesn't work. How would it be possible to sed inline from the grep command?

I'm on macOS Mojave.

Upvotes: 1

Views: 164

Answers (2)

anubhava
anubhava

Reputation: 785481

This sed should work for you:

sed -E '/CRAN Task View:/s~\(([^)]+)\)~(http://cran.rstudio.com/web/views/\1)~' file

Some text here
[...]
-   CRAN Task View: [Bayesian](http://cran.rstudio.com/web/views/Bayesian.html)
-   CRAN Task View: [Cluster](http://cran.rstudio.com/web/views/Cluster.html)
-   CRAN Task View: [Databases](http://cran.rstudio.com/web/views/Databases.html)
-   CRAN Task View: [Environmetrics](http://cran.rstudio.com/web/views/Environmetrics.html)
[...]
End of text here

RegEx Details:

  • /CRAN Task View:/: Only if line matches text "CRAN Task View:"
  • s~: Substitute
  • \(: Match a (
  • ([^)]+): Match 1+ non-) characters in capture group #1
  • \): Match a )
  • (http://cran.rstudio.com/web/views/\1) is replacement that creates a link using back-reference #1

Upvotes: 4

Socowi
Socowi

Reputation: 27245

sed -e 's|\\([a-zA-Z]*\\)\\.html|http://cran.rstudio.com/web/views/\\1.html|' It doesn't work.

This is a quoting issue. Inside single quotes '...' backslashes \ need no escaping. Bash parses '\\(' as \\( and sends it to sed which interprets it as the literal string \(. Therefore, you are replacing the literal string " \(someLetters\)\.html " which never occurs in your file.

You probably meant sed 's|\([a-zA-Z]*\)\.html|http://cran.rstudio.com/web/views/\1.html|'.

By the way: sed can also do the grep part for you. Also, with -E you need less backslashes. But since you append the .html again, you don't need the group \(....\) in the first place.

sed -E -n '/CRAN Task View: \[/s|[a-zA-Z]*\.html|http://cran.rstudio.com/web/views/&|p'

Upvotes: 1

Related Questions