Fox_01
Fox_01

Reputation: 97

Python regex on csv file

I have question better say how to think for best solution on this problem. My CSV file looks like :

,02/12/2013,03/12/2013,04/12/2013,05/12/2013,06/12/2013,07/12/2013,08/12/2013,
06:00,"06:00 World Sport","06:00 World Sport","06:00 World Sport","06:00 World Sport","06:00 World Sport","06:00 World Sport","06:00 World Sport",06:00
,,,,,,,,
06:15,,,,,,,,06:15
,,,,,,,,
06:30,"06:30 Inside Africa: November 29, 2013","06:30 African Voices: Agatha Achindu","06:30 Inside the Louvre","06:30 Talk Asia: Franz Harary","06:30 Blueprint","06:30 Inside the Middle East","06:30 CNNGo",06:30

Ok what I need to do is this, compile dates in range from 1 to how much is in one sheet, and put date in every line in front of start, before comma like this example:

02/12/2013, "06:00 World Sport", 03/12/2013 "06:00 World Sport", 04/12/2013 "06:00 World of Sport"...
02/12/2013, "06:30 Inside Africa: November 23,2013", 03/12/2013, "06:30 African Voices.."

And my starting code was like this:

try:

for line in fileinput.input(fnames):

    if re.search(r'\d{2}/\d{2}/\d{4}.*',line):
            line_date = re.findall(r'\d{2}/\d{2}/\d{4}',line)[0]
            output.write(line_date+'\n')

    if re.search(r'\".+?\"',line):
        line_sadrzaj = re.findall(r'\".+?\"',line)[0]
        output.write(line_sadrzaj+'\n')



output.close()

Do you have and better idea for this problem.

Maybe this way:

for line in fileinput.input(fnames):

                if re.search(r'\d{2}/\d{2}/\d{4}.*',line):
                    line_date = re.findall(r'\d{2}/\d{2}/\d{4}.*',line)[0]
                    line_split = re.split(r'\,',line_date)
                    for line1 in line_split:
                        var = line1
                        output.write(var+'\n')

                if re.search(r'\".+?\".*',line):
                    line_sadrzaj = re.findall(r'\".+?\".*',line)[0]
                    line_split1  = re.split  (r'\,',line_sadrzaj)
                    for line2 in line_split1:
                        var2 = line2
                        output.write(var2+'\n')
                    #output.write(line_sadrzaj+'\n'

Upvotes: 1

Views: 2517

Answers (1)

sloth
sloth

Reputation: 101162

You don't need regex at all; just use the csv module to read the csv file, then transform the result to your desired output.

Example:

import csv
with open('csv.csv') as text:
    table = list(csv.reader(text))

# get all dates (skipping first and last column)
dates = table[0][1:-1]

# get all shows (skipping first and last column and empty rows)
shows =  filter(''.join, (t[1:-1] for t in table[1:]))

# join dates and shows back together and do some formatting
for line in [zip(dates, s) for s in shows]:
    print ', '.join('{}, "{}"'.format(*t) for t in line)

Result:

02/12/2013, "06:00 World Sport", 03/12/2013, "06:00 World Sport", 04/12/2013, "06:00 World Sport", 05/12/2013, "06:00 World Sport", 06/12/2013, "06:00 World Sport", 07/12/2013, "06:00 World Sport", 08/12/2013, "06:00 World Sport"
02/12/2013, "06:30 Inside Africa: November 29, 2013", 03/12/2013, "06:30 African Voices: Agatha Achindu", 04/12/2013, "06:30 Inside the Louvre", 05/12/2013, "06:30 Talk Asia: Franz Harary", 06/12/2013, "06:30 Blueprint", 07/12/2013, "06:30 Inside the Middle East", 08/12/2013, "06:30 CNNGo"

Upvotes: 3

Related Questions