Alan
Alan

Reputation: 131

Only writing to a file certain columns of a line (seperated by spaces)

I am reading a log line by line. I am trying to only print certain columns of the line. With a bash script I would use awk and $ to seperate it. However, I cant figure out how to do it with Python. I tried using split, but it doesnt do exactly what I want.

My code right now:

for line in file:
    if STORED_PROCS_BEGIN in line:
        log.write(line)
    elif STORED_PROCS_FINISHED in line:
        log.write(line)
    elif STORED_TASK_BEGIN in line:
        log.write(line)
    elif STORED_TASK_FINISHED in line:
        log.write(line)
    elif ACTUATE_REPORT_SCHEDULE in line:
        break

So when I am trying to format the line being passed into write().

Example of what I want:

date time info junk1 junk2 name => date time info name

Edit: I got an idea that I could split and extract the fields I want and them join them together.. But there has to be a better what.

Upvotes: 1

Views: 99

Answers (3)

torkil
torkil

Reputation: 111

Im assuming columns are spaced with tabs. If you really do not want to do:

columns = line.split("\t")
line = "\t".join(columns[:3] + columns[5:])

or the more compact and uglier:

line = "\t".join(line.split("\t")[:3] + line.split("\t")[5:])

...you could use regex replace:

line = re.sub(r'(\S+\t\S+\t\S+)\t\S+\t\S+\t(\S+)', r'\1\t\2', line)

\t = tab

\S+ = one or more non-whitespace letters

() = group

This groups the first four columns as reference \1 and the last column as reference \2 two and substitutes the matching expression with group 1 and two separated by a tab.

Ran in interactive python:

$ re.sub(r'(\S+\t\S+\t\S+)\t\S+\t\S+\t(\S+)', r'\1\t\2',line)
'date\ttime\tinfo\tname'

Upvotes: 1

kuter
kuter

Reputation: 204

try this one:

' '.join(filter(lambda x: x not in ['junk1', 'junk2'] , line.split()))

Upvotes: 1

Alfe
Alfe

Reputation: 59436

You can split a line into its words using split(), that's right. Then you can index the columns you want to have in the output:

line = 'date time info junk1 junk2 name'
parts = line.split()
parts_I_want = parts[0:3] + parts[5:6]
print ' '.join(parts_I_want)

If you just want to remove some columns, you can also use del:

line = 'date time info junk1 junk2 name'
parts = line.split()
del parts[4]  # junk2
del parts[3]  # junk1
print ' '.join(parts)

Upvotes: 2

Related Questions