Joel G Mathew
Joel G Mathew

Reputation: 8061

grep regex query

I ran an inline sed on a big directory of php files. The intention was to convert the http:// prefix of urls to protocol-relative urls // so that they could even work with https links. Inadvertently I ended up doing this to several hundreds of curl queries, and unfortunately those dont like those urls. So I tried to find these with grep for pattern //. The problem is that comments also begin with //. I tried chaining grep in a pipe so that I can exclude comments.

I was trying to exclude 1 or more spaces at the beginning of the line, since most of my comments seemed indented. But its not working.

grep --color=always -inr '//' *php | grep -v '^\s+//'

My reasoning is that the first grep matches comments with two slashes, then the second one excludes those lines where the line begins with one or more spaces. However it doesnt seem to work like that. Here's a sample I got:

tvsearch.php:3:// Resource for iteration of nested php arrays
tvsearch.php:4:// //stackoverflow.com/a/3684584/1305947
tvsearch.php:14:// define('__ROOT__', dirname(dirname(__FILE__)));
tvsearch.php:15:// require_once(__ROOT__.'/htdocs/config.php');
tvsearch.php:16:// require_once(__ROOT__.'/htdocs/sqlfunctions.php');
tvsearch.php:17:// GetCredentialsDB();
tvsearch.php:19:  // <link href="/css/bootstrap.min.css" rel="stylesheet">
tvsearch.php:20:  // <link href="https://code.jquery.com/ui/1.11.3/themes/smoothness/jquery-ui.css" rel="Stylesheet"></link>
tvsearch.php:21:  // <script src="/js/bootstrap.js"></script>
tvsearch.php:22:  // <script src="https://code.jquery.com/ui/1.11.4/jquery-ui.js"></script>
tvsearch.php:25:// <link href="/css/grid.css" rel="stylesheet">
tvsearch.php:26:// <link href="/css/cover.css?v=1" rel="stylesheet">
tvsearch.php:44:    <!-- link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/bootstrap/4.0.0-alpha.5/css/bootstrap.min.css" integrity="sha384-AysaV+vQoT3kOAXZkl02PThvDr8HYKPZhNT5h/CXfBThSRXQ6jW5DO2ekP5ViFdi" crossorigin="anonymous"> -->
tvsearch.php:52:    <!-- <script src="https://ajax.googleapis.com/ajax/libs/jquery/3.1.1/jquery.min.js"  integrity="sha384-3ceskX3iaEnIogmQchP8opvBy3Mi7Ce34nWjpBIwVTHfGYWQS9jwHDVRnpKKHJg7" crossorigin="anonymous"></script> -->
tvsearch.php:53:    <!-- <script src="https://cdnjs.cloudflare.com/ajax/libs/tether/1.3.7/js/tether.min.js" integrity="sha384-XTs3FgkjiBgo8qjEjBk0tGmf3wPrWtA6coPfQDfFEY8AnYJwjalXCiosYRBIBZX8" crossorigin="anonymous"></script> -->
tvsearch.php:56:    <script src="https://maxcdn.bootstrapcdn.com/bootstrap/4.0.0-alpha.5/js/bootstrap.min.js" integrity="sha384-BLiI7JTZm+JWlgKa0M0kGRpJbF2J8q+qreVrKBC47e3K6BW78kGLrCkeRX6I9RoK" crossorigin="anonymous"></script>
tvsearch.php:100:              <li><a href="//www.tv.com" target="_blank">TV.com</a></li>';
tvshowcarousel.php:54:                <li><a href="//www.tv.com" target="_blank">TV.com</a></li>
tvtest.php:2:// Resource for iteration of nested php arrays
tvtest.php:3:// //stackoverflow.com/a/3684584/1305947
tvtest.php:11:// require_once "/tvmaze/TVMazeIncludes.php";
tvtest.php:18:   //Return all tv shows relating to the given input
tvtest.php:19:   // $showinfo = $Client->TVMaze->search("Arrow");
tvtest.php:21:   //Return the most relevant tv show to the given input
tvtest.php:23:   // Array [0] contains general info about the show
tvtest.php:29:   // print_r($showinfo);
tvtest.php:30:   // Array [1] contains all episode information
tvtest.php:32:   //
tvtest.php:33:   // // print_r($showinfo[1]);
tvtest.php:37:                  // print_r($innerArray);
tvtest.php:40:                           // echo "<p>Key:$key</p>";
tvtest.php:41:                           // echo "<p>Value:$value</p>";
tvtest.php:47:                                  // print "<p>Season:".$season." Episode:".$episode."</p>";
tvtest.php:57:   // print $showinfo[1]['season']
tvtest.php:58:  // $tmpArray = $showinfo[1];
tvtest.php:59:  // foreach ($tmpArray as $innerArray) {
tvtest.php:60:  //      print $innerArray['season'];
tvtest.php:61:  // }
tvtest.php:65:   // print_r($showinfo[0]->[summary]);

How am I going the wrong way about this? What I need is to match only lines like these:

tvshowcarousel.php:54:                <li><a href="//www.tv.com" target="_blank">TV.com</a></li>
torcontrol.php:        curl_setopt($ch,CURLOPT_URL,"//".$this->host."/gui/?action=forcestart".$hashes);

To summarize, The problem: Devise a grep query to find // within the line, but not beginning the line (excluding any spaces before them)

Upvotes: 0

Views: 115

Answers (1)

MauricioRobayo
MauricioRobayo

Reputation: 2356

Maybe you can focus on the pattern you are trying to catch instead of the pattern you are not trying to catch.

For example, if all // are between quotes you can try something like this:

grep --color=always -inr '"//.*"' *php

Upvotes: 1

Related Questions