Reputation: 969
I am trying to extract data from columns in a text file. One of the columns has a header which I also need to extract a whole column with repeating entries of the header, i.e:
col1 col2 col3
1 1 1
2 2 2
3 3 3
into:
col1 col2 col3 col3
1 1 1 col3
2 2 2 col3
3 3 3 col3
I am struggling isolating the header.
for line in my_file:
line = line.split("\t")
column = line[0:3] #col1-3
How do I get the header from col3 and then put it repeating? Do I have to split the line by "\n"
first, then by "\t"
?
I tried to do this but got an error message?
Upvotes: 0
Views: 2266
Reputation: 762
with open('/home/prashant/Desktop/data.txt') as f:
for l in f:
print l.strip( ).split("\n")
This might solve your problem results I'm getting are
[col1 col2 col3]
[1 1 1]
[2 2 2]
[3 3 3]
Upvotes: 1
Reputation: 774
Why dont you use pandas.
import pandas as pd
df = pd.read_csv("filename.tsv",sep="\t")
In order to get the column header also you can use
df.ix[:,2:]
Upvotes: 1
Reputation: 46789
You could use Python's CSV
module as follows. This can handle the splitting up of all of the columns for you automatically. By default it assumes columns are separated by commas, but this can be switched to a tab by specifying which delimiter to use:
import csv
with open('input.csv', 'rb') as f_input, open('output.csv', 'wb') as f_output:
csv_input = csv.reader(f_input, delimiter='\t')
csv_output = csv.writer(f_output, delimiter='\t')
header = next(csv_input)
csv_output.writerow(header + [header[-1]])
for cols in csv_input:
print cols
csv_output.writerow(cols + [header[-1]])
For your given input, you will get the following output (columns are tab delimited):
col1 col2 col3 col3
1 1 1 col3
2 2 2 col3
3 3 3 col3
Tested using Python 2.7.9
Upvotes: 0