Divyanshu
Divyanshu

Reputation: 99

Replacing a line in a file based on a keyword search, by line from another file

Here is my file1:

agadfad
sdffasdf
Element 1, 0, 0, 0
Pcom
Element 2

Here is my file2:

PBAR
Element 1, 100, 200, 300, 400
Element 2
Continue...

I want to search with a keyword, "Element 1" in file1, if found store the whole line; then search in file2, if found at some line, replace it with the data from file1 which is in this case "Element 1,0,0,0". Similarly, if there are more keywords like "Element 2, Element 3 and so on...", and the files are very big, it should do the same (But this part comes later). I tried following code:

    index1 = 0
    index2 = 0
    path1 = "C:\Users\sony\Desktop\BDF1.TXT"
    path2 = "C:\Users\sony\Desktop\BDF2.TXT"
    Target = 'Element 1'
    with open(path1) as f1:
       list1 = f1.readlines()
       for line in list1:
           index1 = index1 + 1
           if Target in line:
               print "Match Found at line %d" %(index)
           else:
               print "No Match Found in the target file!"
           with open(path2, "r+") as f2:
               list2 = f2.readlines()
               for line2 in list2:
                   index2 = index2 + 1
                   if Target in line2:
                        list2[index2] = line + '                    \n'
                   else:
                        print "No match found in the targetorg file!"
               f2.writelines(list2)

I am getting some output which looks like this:

PBAR
Element 1, 100, 200, 300, 400
Element 2
Continue... PBAR
Element 1, 100, 200, 300, 400
agadfad
Continue...

And i am also getting error list assignment index out of range at somewhere line 20. It seems easier, but having hard time to figure it out.

Upvotes: 0

Views: 114

Answers (1)

Reti43
Reti43

Reputation: 9796

Regular expressions will do what you want easily, assuming the format is always the same. That is each line has the format "Element N, more stuff", where

  • "Element N" is always capitalised, followed by a space and then only numbers
  • more stuff is consisted of only spaces, commas and numbers.

Code

import re

with open(path1) as f1, open(path2) as f2:
    dat1 = f1.read()
    dat2 = f2.read()

    matches = re.findall('^Element [0-9]+,[0-9, ]+', dat1, flags=re.MULTILINE)
    for match in matches:
        dat2 = re.sub('^{},[0-9, ]+'.format(match.split(',')[0]), match, dat2, flags=re.MULTILINE)

with open('changed.txt', 'w') as f:
    f.write(dat2)

Explanation

The pattern "^Element [0-9]+,[0-9, ]+" starts at the beginning of a line (because of ^), and matches the string Element, followed by a space, followed by any length of numbers ([0-9]+), followed by a comma, followed by any length of a combination of numbers, commas and spaces ([0-9, ]+). The will effectively find "Element 1, 0, 0, 0", "Element 2, 123, 123, 123, 123, 123" (for example), etc.

Then you iterate through these matches, search for a string that matches "Element 1,[0-9, ]+" (and so on) in the second file and substitute it for the match from the first file.

Upvotes: 1

Related Questions