Reputation: 131
I am reading a log line by line. I am trying to only print certain columns of the line. With a bash script I would use awk and $ to seperate it. However, I cant figure out how to do it with Python. I tried using split, but it doesnt do exactly what I want.
My code right now:
for line in file:
if STORED_PROCS_BEGIN in line:
log.write(line)
elif STORED_PROCS_FINISHED in line:
log.write(line)
elif STORED_TASK_BEGIN in line:
log.write(line)
elif STORED_TASK_FINISHED in line:
log.write(line)
elif ACTUATE_REPORT_SCHEDULE in line:
break
So when I am trying to format the line being passed into write().
Example of what I want:
date time info junk1 junk2 name => date time info name
Edit: I got an idea that I could split and extract the fields I want and them join them together.. But there has to be a better what.
Upvotes: 1
Views: 99
Reputation: 111
Im assuming columns are spaced with tabs. If you really do not want to do:
columns = line.split("\t")
line = "\t".join(columns[:3] + columns[5:])
or the more compact and uglier:
line = "\t".join(line.split("\t")[:3] + line.split("\t")[5:])
...you could use regex replace:
line = re.sub(r'(\S+\t\S+\t\S+)\t\S+\t\S+\t(\S+)', r'\1\t\2', line)
\t = tab
\S+ = one or more non-whitespace letters
() = group
This groups the first four columns as reference \1 and the last column as reference \2 two and substitutes the matching expression with group 1 and two separated by a tab.
Ran in interactive python:
$ re.sub(r'(\S+\t\S+\t\S+)\t\S+\t\S+\t(\S+)', r'\1\t\2',line)
'date\ttime\tinfo\tname'
Upvotes: 1
Reputation: 204
try this one:
' '.join(filter(lambda x: x not in ['junk1', 'junk2'] , line.split()))
Upvotes: 1
Reputation: 59436
You can split a line into its words using split()
, that's right. Then you can index the columns you want to have in the output:
line = 'date time info junk1 junk2 name'
parts = line.split()
parts_I_want = parts[0:3] + parts[5:6]
print ' '.join(parts_I_want)
If you just want to remove some columns, you can also use del
:
line = 'date time info junk1 junk2 name'
parts = line.split()
del parts[4] # junk2
del parts[3] # junk1
print ' '.join(parts)
Upvotes: 2