Liondancer
Liondancer

Reputation: 16479

finding all whitespace in a line

I am trying to remove all the white space in my data file and replace it with one comma , I am currently using regex to do this.

I am getting the error:

Traceback (most recent call last):
  File "parse_prime.py", line 12, in <module>
    newline = line.replace(whitespace,",")
TypeError: expected a character buffer object

Here is my code

import re

token = re.compile(r'\s*')
f = open("prime_data.txt","r")
fw = open("prime_out.txt", "w+")

primelist = []

for line in f.readlines():
    findtoken = re.search(token, line)
    replacetoken = line.replace(findtoken,",")

    fw.write(newline)

I dont think I am searching for the regex properly. I think I am stopping once the first set of whitespace is found. How do I look through the whole line?

the data file is in this format

43    3    2    2    123    3

Upvotes: 0

Views: 1249

Answers (1)

Martijn Pieters
Martijn Pieters

Reputation: 1124288

You need to use token.sub() here, and use the correct pattern (match at least one whitespace character or more):

token = re.compile(r'\s+')

for line in f:
    newline = token.sub(line, ',')
    fw.write(newline)

I dropped the .readlines() call; file objects can be looped over directly, no need to read them into memory wholesale.

You could also just use str.split() here instead, no regular expressions required:

for line in f:
    newline = ','.join(line.split())
    fw.write(newline + '\n')

You were trying to call str.replace() which only accepts strings, but you were passing in a re.MatchObject value instead.

Upvotes: 4

Related Questions