Reputation: 21

Python: find and replace sequence of numbers in a file

I would like to replace sequence of numbers in file with some other sequence number. for example I want the code find :

in the file and replace it with

1
2
3
.
.

the format of file is like this :

    5723    1   4  0.0530  40.8469574826  23.6497161096  71.2721134368  # hc
    5724    1   4  0.0530  41.2184192051  22.0657965663  70.7655969235  # hc
    5725    1   4  0.0530  40.1209834536  22.2320441560  72.1100610464  # hc
    5726    1   2  0.0390  38.2072673529  21.5636299564  70.4226801302  # ni
    5727    1   3  0.0080  39.1491515464  22.7414447024  70.1836001683  # c1
    5728    1   4  0.0530  38.6092690356  23.6286807105  70.4379331882  # hc
    5729    1   5 -0.1060  39.4744610200  22.9631667398  68.7099315672  # c
    5730    1   4  0.0530  39.7733681662  22.0164196098  68.2561710623  # hc
    5731    1   4  0.0530  40.3997078786  23.5957910115  68.6602988667  # hc
    5732    1   6 -0.1768  37.4127695738  20.7445960448  69.5033013922  # c5
    5733    1   7  0.1268  37.5907142     20.8480311755  68.4090824525  # h

I've written this cod to do this but it just replace the first , how can I correct this code ?

import os
import sys
import fileinput

masir = os.curdir + '\\test\\'
input  = open('poly-IL9.data', 'r')
output = open('out.data', 'w')
range1 = range(5722,13193)
range2 = range(1,7472)


for i in range(len(x1)):
    for j in range(len(y1)):
        x = str(range1[i])
        y = str(range2[j])
        clean = input.read().replace(x,y)
        output.write(clean)

Upvotes: 1

Answers (3)

Kasravnd

Reputation: 107297

First of all open your file with with statement. instead of opening the file without closing.

The with statement is used to wrap the execution of a block with methods defined by a context manager.

All you need here is loop over your file and split the lines and replace the first element with the number of line :

with open('poly-IL9.data', 'r') as inp,open('out.data', 'w') as out :
    for i,line in enumerate(inp,1):
       out.write(' '.join([str(i)]+line.split()[1:])+'\n')

You can use enumerate to loop over your file-object to preserve the indices.

Also as an alternative way you can use csv module for opening the file to refuse of splitting the lines.

import csv
with open('poly-IL9.data', 'r') as inp,open('out.data', 'w') as out:
    spamreader = csv.reader(csvfile, delimiter=' ')
    for i,row in enumerate(spamreader):
        out.write(' '.join([str(i)]+line[1:])+'\n')

Note if your file is separated with other whitespaces or mix of them you can use re.split() function to split your file with regex :

import re
with open('poly-IL9.data', 'r') as inp,open('out.data', 'w') as out :
    for i,line in enumerate(inp,1):
       out.write(' '.join([str(i)]+re.split(r'\s+',line)[1:]+'\n')

Upvotes: 1

Zero

Reputation: 76927

If you want to work on data, you want to consider using Pandas library

And, here's on way to do it in pandas

Read the csv file using pd.read_csv

In [4]: df = pd.read_csv('temp.csv')

In [5]: df
Out[5]:
      b  c       d          e          f          g
5723  1  4  0.0530  40.846957  23.649716  71.272113
5724  1  4  0.0530  41.218419  22.065797  70.765597
5725  1  4  0.0530  40.120983  22.232044  72.110061
5726  1  2  0.0390  38.207267  21.563630  70.422680
5727  1  3  0.0080  39.149152  22.741445  70.183600
5728  1  4  0.0530  38.609269  23.628681  70.437933
5729  1  5 -0.1060  39.474461  22.963167  68.709932
5730  1  4  0.0530  39.773368  22.016420  68.256171
5731  1  4  0.0530  40.399708  23.595791  68.660299
5732  1  6 -0.1768  37.412770  20.744596  69.503301
5733  1  7  0.1268  37.590714  20.848031  68.409082

Use reset_index(drop=True) to reset the index order. Here the index starts from 0

In [6]: df.reset_index(drop=True)
Out[6]:
    b  c       d          e          f          g
0   1  4  0.0530  40.846957  23.649716  71.272113
1   1  4  0.0530  41.218419  22.065797  70.765597
2   1  4  0.0530  40.120983  22.232044  72.110061
3   1  2  0.0390  38.207267  21.563630  70.422680
4   1  3  0.0080  39.149152  22.741445  70.183600
5   1  4  0.0530  38.609269  23.628681  70.437933
6   1  5 -0.1060  39.474461  22.963167  68.709932
7   1  4  0.0530  39.773368  22.016420  68.256171
8   1  4  0.0530  40.399708  23.595791  68.660299
9   1  6 -0.1768  37.412770  20.744596  69.503301
10  1  7  0.1268  37.590714  20.848031  68.409082

You could also construct your unique index starting from 1 like

In [7]: df.set_index(np.arange(1, len(df)+1))
Out[7]:
    b  c       d          e          f          g
1   1  4  0.0530  40.846957  23.649716  71.272113
2   1  4  0.0530  41.218419  22.065797  70.765597
3   1  4  0.0530  40.120983  22.232044  72.110061
4   1  2  0.0390  38.207267  21.563630  70.422680
5   1  3  0.0080  39.149152  22.741445  70.183600
6   1  4  0.0530  38.609269  23.628681  70.437933
7   1  5 -0.1060  39.474461  22.963167  68.709932
8   1  4  0.0530  39.773368  22.016420  68.256171
9   1  4  0.0530  40.399708  23.595791  68.660299
10  1  6 -0.1768  37.412770  20.744596  69.503301
11  1  7  0.1268  37.590714  20.848031  68.409082

Note: There will be simpler ways to just modify the file. However, if you want to process, analyze the data - using pandas will make your life easier.

Upvotes: 0

Craig Burgler

Reputation: 1779

The read() method in clean = input.read().replace(x,y) is reading the entire file at once, so it makes sense that only one replacement is made. Try readline() or the preferred for line in file:to process the file line by line.

Upvotes: 0

Python: find and replace sequence of numbers in a file

Answers (3)

Related Questions