Craig
Craig

Reputation: 105

Get date from string by splitting

I have a batch of raw text files. Each file begins with Date>>month.day year News garbage.

garbage is a whole lot of text I don't need, and varies in length. The words Date>> and News always appear in the same place and do not change.

I want to copy month day year and insert this data into a CSV file, with a new line for every file in the format day month year.

How do I copy month day year into separate variables?

I tryed to split a string after a known word and before a known word. I'm familiar with string[x:y], but I basically want to change x and y from numbers into actual words (i.e. string[Date>>:News])

import re, os, sys, fnmatch, csv
folder = raw_input('Drag and drop the folder > ')
for filename in os.listdir(folder):
# First, avoid system files
if filename.startswith("."):
    pass
else:
    # Tell the script the file is in this directory and can be written
    file = open(folder+'/'+filename, "r+")
    filecontents = file.read()
    thestring = str(filecontents)
    print thestring[9:20]

An example text file:

Date>>January 2. 2012 News 122

5 different news agencies have reported the story of a man washing his dog.

Upvotes: 0

Views: 4919

Answers (3)

dano
dano

Reputation: 94871

Here's a solution using the re module:

import re

s = "Date>>January 2. 2012 News 122"
m = re.match("^Date>>(\S+)\s+(\d+)\.\s+(\d+)", s)
if m:
   month, day, year = m.groups()
   print("{} {} {}").format(month, day, year)

Outputs:

January 2 2012

Edit:

Actually, there's another nicer (imo) solution using re.split described in the link Robin posted. Using that approach you can just do:

month, day, year = re.split(">>| |\. ", s)[1:4]

Upvotes: 1

Yike Lu
Yike Lu

Reputation: 1035

You could use string.split:

x = "A b c"
x.split(" ")

Or you could use regular expressions (which I see you import but don't use) with groups. I don't remember the exact syntax off hand, but the re is something like r'(.*)(Date>>)(.*). This re searches for the string "Date>>" in between two strings of any other type. The parentheses will capture them into numbered groups.

Upvotes: 0

Ian Leaman
Ian Leaman

Reputation: 145

You can use the string method .split(" ") to separate the output into a list of variables split at the space character. Because year and month.day will always be in the same place you can access them by their position in the output list. To separate month and day use the .split function again, but this time for .

Example:

list = theString.split(" ")
year = list[1]
month= list[0].split(".")[0]
day = list[0].split(".")[1]

Upvotes: 1

Related Questions