Data processing with Python on data separated by rows and columns

Question

I have a data that is separated by rows(date time row) as such

01-Jan-1990 00:00:01 ABCD
A  abcde fghijk lmnopq
     hsjfne qqq                     # EDITED WITH ADDITONAL SPILL OVER DATA with 	
B abcde fghijk lmnopq
01-Jan-1990 00:00:05 ABCD
A ancfjhr sfjerhj egen
C etfhw3uh uhuefwh fewvjh dfeg efwbywgefb
D wrf fcwewe fvwefwe fwef
01-Jan-1990 00:00:07 ABCD
A wfw fbebwu
B fewhuf ifgiwejhifgj fijweij

Would like to clean it in a way to separate the A, B , C, etc as shown in the 1st value after the date time row as a column and the values that come after the A,B,C as another column followed by capturing the date time and input as another column. Something like this

A,abcde fghijk lmnopq hsjfne qqq, 01-Jan-1990 00:00:01 #WOULD LIKE TO COMBINE THE SPILL DATA
B,abcde fghijk lmnopq, 01-Jan-1990 00:00:01
A,ancfjhr sfjerhj egen,01-Jan-1990 00:00:05
C,etfhw3uh uhuefwh fewvjh dfeg efwbywgefb,01-Jan-1990 00:00:05
D,wrf fcwewe fvwefwe fwefe,01-Jan-1990 00:00:05

etc etc etc

Would greatly appeciate if anyone can guide me along. I've try reading by doing a pattern match then to grab the following lines but unable to accomplish it.

import re
#Log Reading

log=open("IDM.txt","r")


for line in log:
    splitLine = line.split()
    iterator = iter(splitLine)
    datematch = (re.match('^(([0-9])|([0-2][0-9])|([3][0-1])- 
   (JAN|FEB|MAR|APR|MAY|JUN|JUL|AUG|SEP|OCT|NOV|DEC)-\d{4}$',splitLine[0]))
if datematch:
    print(line)

Understand that the code as above is nothing alike to what i want to achieve therefore would like you guys to help me to guide me along and to show that i've tried some stuff. Thank you for your time

EDITED: Included the 3rd line of data to show the spill over of value from 2nd line of data with a tabspace before line

BernardL · Accepted Answer

It is always a good idea to open a file using with open(), then you can parse the lines as you want in a list, in my case I just checked to see the first 2 characters of the line is a digit, if it is, it stores the value to be added to the desired line later:

import csv
content = []

with open('IDM.txt','r') as f:
    lines = f.readlines()
    for idx,line in enumerate(lines):
        if line[:2].isdigit():
                date = line[:20]

        elif idx == len(lines)-1 or (line[0] != ' ' and lines[idx+1][0] != ' '):
            data = line[0] + ',' + line[1:].rstrip('
') 
            content.append(data+ ', '+ date)  

        elif lines[idx+1][0] == ' ':
            spill = lines[idx+1].rstrip('
').strip()
            data = line[0] + ',' + line[1:].rstrip('
') + ' ' + spill
            content.append(data+ ', '+ date)

        else:
            pass


with open('IDMOutput.csv','w') as f:
    for line in content:
        f.write("%s
" % line)

>>content
['A, abcde fghijk lmnopq hsjfne qqqqq, 01-Jan-1990 00:00:01',
 'B, abcde fghijk lmnopq, 01-Jan-1990 00:00:01',
 'A, ancfjhr sfjerhj egen, 01-Jan-1990 00:00:05',
 'C, etfhw3uh uhuefwh fewvjh dfeg efwbywgefb, 01-Jan-1990 00:00:05',
 'D, wrf fcwewe fvwefwe fwef, 01-Jan-1990 00:00:05',
 'A, wfw fbebwu, 01-Jan-1990 00:00:07',
 'B, fewhuf ifgiwejhifgj fijweij, 01-Jan-1990 00:00:07']

Edited: Added rstrip to remove ' ' and included timestamp and spill over updated in question with the output.

Data processing with Python on data separated by rows and columns

Answers (2)

Update: Final and last solution as `Generalization`:

Related Questions

Data processing with Python on data separated by rows and columns

Answers (2)

Update: Final and last solution as Generalization:

Related Questions

Update: Final and last solution as `Generalization`: