aaront
aaront

Reputation: 1245

Regex Substitution in Python

I have a CSV file with several entries, and each entry has 2 unix timestamp formatted dates.

I have a method called convert(), which takes in the timestamp and converts it to YYYYMMDD.

Now, since I have 2 timestamps in each line, how would I replace each one with the new value?

EDIT: Just to clarify, I would like to convert each occurrence of the timestamp into the YYYYMMDD format. This is what is bugging me, as re.findall() returns a list.

Upvotes: 2

Views: 1766

Answers (4)

buster
buster

Reputation: 1151

Not able to comment your question, but did you take a look at the CSV module of python? http://docs.python.org/library/csv.html#module-csv

Upvotes: 1

fearphage
fearphage

Reputation: 16918

I'd use something along these lines. A lot like Laurence's response but with the timestamp conversion that you requested and takes the filename as a param. This code assumes you are working with recent dates (after 9/9/2001). If you need earlier dates, lower 10 to 9 or less.

import re, sys, time

regex = re.compile(r'(\d{10,})')

def convert(unixtime):
  return time.strftime("%Y%m%d", time.gmtime(unixtime))

for line in open(sys.argv[1]):
  sys.stdout.write(regex.sub(lambda m: convert(int(m.group(0))), line))

EDIT: Cleaned up the code.

Sample Input

foo,1234567890,bar,1243310263
cat,1243310263,pants,1234567890
baz,987654321,raz,1

Output

foo,20090213,bar,20090526
cat,20090526,pants,20090213
baz,987654321,raz,1 # not converted (too short to be a recent)

Upvotes: 0

Luke Schafer
Luke Schafer

Reputation: 9265

If you know the replacement:

p = re.compile( r',\d{8},')
p.sub( ','+someval+',', csvstring )

if it's a format change:

p = re.compile( r',(\d{4})(\d\d)(\d\d),')
p.sub( r',\3-\2-\1,', csvstring )

EDIT: sorry, just realised you said python, modified above

Upvotes: 3

Laurence Gonsalves
Laurence Gonsalves

Reputation: 143094

I assume that by "unix timestamp formatted date" you mean a number of seconds since the epoch. This assumes that every number in the file is a UNIX timestamp. If that isn't the case you'll need to adjust the regex:

import re, sys

# your convert function goes here

regex = re.compile(r'(\d+)')
for line in sys.stdin:
  sys.stdout.write(regex.sub(lambda m:
  convert(int(m.group(1))), line))

This reads from stdin and calls convert on each number found.

The "trick" here is that re.sub can take a function that transforms from a match object into a string. I'm assuming your convert function expects an int and returns a string, so I've used a lambda as an adapter function to grab the first group of the match, convert it to an int, and then pass that resulting int to convert.

Upvotes: 1

Related Questions