ffledgling
ffledgling

Reputation: 12150

Sed and awk application

I've read a little about sed and awk, and understand that both are text manipulators.

I plan to use one of these to edit groups of files (code in some programming language, js, python etc.) to make similar changes to large sets of files. Primarily editing function definitions (parameters passed) and variable names for now, but the more I can do the better.

I'd like to know if someone's attempted something similar, and those who have, are there any obvious pitfalls that one should look out for? And which of sed and awk would be preferable/more suitable for such an application. (Or maybe something entirely else? )

Input

function(paramOne){
//Some code here
var variableOne = new ObjectType;
array[1] = "Some String";
instanceObj = new Something.something;
}

Output

function(ParamterOne){
//Some code here
var PartOfSomething.variableOne = new ObjectType;
sArray[1] = "Some String";
var instanceObj = new Something.something
}

Upvotes: 0

Views: 661

Answers (2)

Udo Klein
Udo Klein

Reputation: 6902

As soon as it starts to get slightly more complicated you will switch to a script language anyway. So why not start with python in the first place?

Walking directories: walking along and processing files in directory in python

Replacing text in a file: replacing text in a file with Python

Python regex howto: http://docs.python.org/dev/howto/regex.html

I also recommend to install Eclipse + PyDev as this will make debugging a lot easier.

Here is an example of a simple automatic replacer

import os;
import sys;
import re; 
import itertools;

folder = r"C:\Workspaces\Test\";
skip_extensions = ['.gif', '.png', '.jpg', '.mp4', ''];
substitutions = [("Test.Alpha.", "test.alpha."), 
                 ("Test.Beta.", "test.beta."),
                 ("Test.Gamma.", "test.gamma.")];

for root, dirs, files in os.walk(folder):
    for name in files:
        (base, ext) = os.path.splitext(name);
        file_path = os.path.join(root, name);
        if ext in skip_extensions: 
            print "skipping", file_path;
        else:
            print "processing", file_path;

            with open(file_path) as f:
                s = f.read();

            before = [[s[found.start()-5:found.end()+5] for found in re.finditer(old, s)] for old, new in substitutions];
            for old, new in substitutions:
                s = s.replace(old, new);
            after = [[s[found.start()-5:found.end()+5] for found in re.finditer(new, s)] for old, new in substitutions];

            for b, a in zip(itertools.chain(*before), itertools.chain(*after)):
                print b, "-->", a;

            with open(file_path, "w") as f:
                f.write(s);

Upvotes: 0

Ed Morton
Ed Morton

Reputation: 204174

Here's a GNU awk (for "gensub()" function) script that will transform your sample input file into your desired output file:

$ cat tst.awk
BEGIN{ sym = "[[:alnum:]_]+" }
{
   $0 = gensub("^(" sym ")[(](" sym ")[)](.*)","\\1(ParameterOne)\\3","")
   $0 = gensub("^(var )(" sym ")(.*)","\\1PartOfSomething.\\2\\3","")
   $0 = gensub("^a(rray.*)","sA\\1","")
   $0 = gensub("^(" sym " =.*)","var \\1","")

   print
}

$ cat file
function(paramOne){
//Some code here
var variableOne = new ObjectType;
array[1] = "Some String";
instanceObj = new Something.something;
}

$ gawk -f tst.awk file
function(ParameterOne){
//Some code here
var PartOfSomething.variableOne = new ObjectType;
sArray[1] = "Some String";
var instanceObj = new Something.something;
}

BUT think about how your real input could vary from that - you could have more/less/different spacing between symbols. You could have assignments starting on one line and finishing on the next. You could have comments that contain similar-looking lines to the code that you don't want changed. You could have multiple statements on one line. etc., etc.

You can address every issue one at a time but it could take you a lot longer than just updating your files and chances are you still will not be able to get it completely right.

If your code is EXCEEDINGLY well structured and RIGOROUSLY follows a specific, highly restrictive coding format then you might be able to do what you want with a scripting language but your best bets are either:

  1. change the files by hand if there's less than, say, 10,000 of them or
  2. get a hold of a parser (e.g. the compiler) for the language your files are written in and modify that to spit out your updated code.

Upvotes: 2

Related Questions