Reputation: 33
I have a text file with some data. I need to split my text file into data frame. This is my text file:
2012/02/03 18:55:54 SampleClass1 verb detail for id 19471668
verb detail for id 185289
verb detail for id 185289
verb detail for id 1852849
2012/03/03 18:55:54 SampleClass8 detail for id 2181536
2012/04/03 18:55:54 SampleClass1 verb detail for id 1765383670
2012/05/03 18:55:54 SampleClass9 verb detail for id 1666944491
2012/06/03 18:55:54 SampleClass8 detail for id 799914029 verb detail for id 185229
I want to split a date and time separately and also some string then I need to convert it into a data frame.
My expected output:
date time desc
2012/02/03 18:55:54 SampleClass9 verb detail for id 1947166588
verb detail for id 185289
verb detail for id 185289
verb detail for id 1852849
2012/03/03 18:55:54 SampleClass8 detail for id 218851536
verb detail for id 1852829
verb detail for id 185289
verb detail for id 1852849
2012/04/03 18:55:54 SampleClass1 verb detail for id 1765383670
verb detail for id 1852829
verb detail for id 1852829
verb detail for id 1852849
2012/05/03 18:55:54 SampleClass9 verb detail for id 1666944491
verb detail for id 1852829
verb detail for id 1852829
verb detail for id 18528429
2012/06/03 18:55:54 SampleClass8 detail for id 799914029 verb detail for id 1852844029
verb detail for id 1852829
verb detail for id 1852829
verb detail for id 18528429
Upvotes: 0
Views: 250
Reputation: 290
As per the data you have put, the below code does the job.
import csv
import pandas as pd
file = "/path/to/file/"
# Open CSV file
with open(file, "r", newline="") as fp:
# Read the text file and use a space delimiter
reader = csv.reader(fp, delimiter=" ")
rows = []
# loop through the rows
for row in reader:
# if empty row then continue
if not row:
continue
#if the first character of the row is a number join the columns after
# column 2, as columns one and two are already separated
elif row[0][0].isdigit():
rows.append(row[:2]+ [' '.join(row[2:])])
# else add two columns and then join the columns
else:
rows.append(['','']+ [' '.join(row)])
df = pd.DataFrame(rows, columns=['date','time','desc'])
Upvotes: 1