Read and export a single column from a tab-separated file in Python

Question

I have a many large tab-separated files saved as .txt, which each have seven columns with the following headers:

#column_titles = ["col1", "col2", "col3", "col4", "col5", "col6", "text"]

I would like to simply extract the final column named text and save it into a new file with each row being a row from the original file, while are all strings.

EDIT: This is not a duplicate of a similar problem, as splitlines() was not necessary in my case. Only the order of things needed to be improved

Based on -several - other - posts, here is my current attempt:

import csv

# File names: to read in from and read out to
input_file = "tester_2014-10-30_til_2014-08-01.txt"
output_file = input_file + "-SA_input.txt"

## ==================== ##
##  Using module 'csv'  ##
## ==================== ##
with open(input_file) as to_read:
    reader = csv.reader(to_read, delimiter = "	")

    desired_column = [6]        # text column

    for row in reader:
    myColumn = list(row[i] for i in desired_column)

with open(output_file, "wb") as tmp_file:
    writer = csv.writer(tmp_file)

for row in myColumn:
    writer.writerow(row)

What I am getting, is simply the text field from the 2624th row form my input file, with each of the letters in that string being separated out:

H,o,w, ,t,h,e, ,t.e.a.m, ,d,i,d, ,T,h,u,r,s,d,a,y, ,-, ,s,e,e , ,h,e,r,e

I know very little in the world of programming is random, but this is definitely strange!

This post is pretty similar to my needs, but misses the writing and saving parts, which I am also not sure about.

I have looked into using the pandas toolbox (as per one of those links above), but I am unable to due my Python installation, so please only solutions using csv or other built in modules!

Serge Ballesta · Accepted Answer

You must process the file one row at a time: read, parse and write.

import csv

# File names: to read in from and read out to
input_file = "tester_2014-10-30_til_2014-08-01.txt"
output_file = input_file + "-SA_input.txt"

## ==================== ##
##  Using module 'csv'  ##
## ==================== ##
with open(input_file) as to_read:
    with open(output_file, "wb") as tmp_file:
        reader = csv.reader(to_read, delimiter = "	")
        writer = csv.writer(tmp_file)

        desired_column = [6]        # text column

        for row in reader:     # read one row at a time
            myColumn = list(row[i] for i in desired_column)   # build the output row (process)
            writer.writerow(myColumn) # write it

Read and export a single column from a tab-separated file in Python

Answers (2)

Related Questions