trouselife
trouselife

Reputation: 969

Extracting data from columns in text file using Python

I am trying to extract data from columns in a text file. One of the columns has a header which I also need to extract a whole column with repeating entries of the header, i.e:

col1 col2 col3
1     1     1
2     2     2
3     3     3

into:

col1 col2 col3  col3
1     1     1   col3
2     2     2   col3
3     3     3   col3

I am struggling isolating the header.

for line in my_file:
    line = line.split("\t")
    column = line[0:3] #col1-3

How do I get the header from col3 and then put it repeating? Do I have to split the line by "\n" first, then by "\t"?

I tried to do this but got an error message?

Upvotes: 0

Views: 2266

Answers (3)

Prashant Shukla
Prashant Shukla

Reputation: 762

with open('/home/prashant/Desktop/data.txt') as f:
for l in f:
    print l.strip( ).split("\n")

This might solve your problem results I'm getting are

[col1 col2 col3]

[1 1 1]

[2 2 2]

[3 3 3]

Upvotes: 1

Rohan Amrute
Rohan Amrute

Reputation: 774

Why dont you use pandas.

     import pandas as pd
     df = pd.read_csv("filename.tsv",sep="\t")

In order to get the column header also you can use

      df.ix[:,2:]

Upvotes: 1

Martin Evans
Martin Evans

Reputation: 46789

You could use Python's CSV module as follows. This can handle the splitting up of all of the columns for you automatically. By default it assumes columns are separated by commas, but this can be switched to a tab by specifying which delimiter to use:

import csv

with open('input.csv', 'rb') as f_input, open('output.csv', 'wb') as f_output:
    csv_input = csv.reader(f_input, delimiter='\t')
    csv_output = csv.writer(f_output, delimiter='\t')
    header = next(csv_input)
    csv_output.writerow(header + [header[-1]])

    for cols in csv_input:
        print cols
        csv_output.writerow(cols + [header[-1]])

For your given input, you will get the following output (columns are tab delimited):

col1    col2    col3    col3
1   1   1   col3
2   2   2   col3
3   3   3   col3

Tested using Python 2.7.9

Upvotes: 0

Related Questions