john tan
john tan

Reputation: 123

Using split to read specific string from a line in a file

Given the text file

sample.txt

2012-01-01  09:00   San Diego   Men's Clothing    214.05    Amex
2012-01-01  09:00   San Diego   Women's Clothing  153.57    Visa
2012-01-01  09:00   Omaha       Music             66.08     Cash

I want to be able to read only the text for the third column. This code

for line in open("sample.txt"):
      city=line.split()[2]
      print(city)

can read the third column to a certain degree:

San
San
Omaha

but what I want is:

San Diego
San Diego
Omaha

How do I do this?

Upvotes: 2

Views: 1044

Answers (5)

Jonathon McMurray
Jonathon McMurray

Reputation: 2991

It appears your input file has fixed width fields. You might be able to achieve your goal using indexing in this case, e.g.

>>> for line in open('test.txt'):
...     print(line[20:32])
...
San Diego
San Diego
Omaha

You could add a .strip() to trim off trailing spaces if you need that for further processing etc.

Upvotes: 1

Ezekiel Sebastine
Ezekiel Sebastine

Reputation: 258

You will need to preprocess your input file by adding a delimeter which you will specify in your split() function. Like this:

2012-01-01,  09:00,   San Diego,   Men's Clothing,    214.05,    Amex
2012-01-01,  09:00,   San Diego,   Women's Clothing,  153.57,    Visa
2012-01-01,  09:00,   Omaha,       Music,             66.08,     Cash

Then

for line in open("sample.txt"):
  city=line.split(",")[2]
  print(city)

Upvotes: -1

RoadRunner
RoadRunner

Reputation: 26335

Since your items in sample.txt are mostly separated by 2 spaces, you need to use split(' ') instead. If you use split(), this will by default split every whitespace, such as turning "Men's Clothing" into ["Men's", "Clothing"], Which is not what you want.

First thing you can do is view your items with:

with open('sample.txt') as in_file:
    for line in in_file.readlines():
        items = [x.strip() for x in line.strip().split('  ') if x]
        print(items)

Which outputs:

['2012-01-01', '09:00', 'San Diego', "Men's Clothing", '214.05', 'Amex']
['2012-01-01', '09:00', 'San Diego', "Women's Clothing", '153.57', 'Visa']
['2012-01-01', '09:00', 'Omaha', 'Music', '66.08', 'Cash']

Now if you want to extract the third column:

print(items[2])

Which gives:

San Diego
San Diego
Omaha

Upvotes: 0

zdgriffith
zdgriffith

Reputation: 174

Your text file delimits with at least two spaces, so specifying to split on two spaces and stripping away the remaining spaces on the ends with strip() works.

with open('sample.txt', 'r') as file_handle:
    for line in file_handle:
        city=line.split('  ')[2].strip()
        print(city)

yields:

San Diego
San Diego
Omaha

Upvotes: 0

marcelotokarnia
marcelotokarnia

Reputation: 386

It does look like your file is separated by tabs (or \t).

Have you tried splitting it by tabs ?

Instead of city=line.split()[2] try city=line.split('\t')[2].

Anyways, it looks like this file has been generated by an excel or similar, have you tried exporting it to a CSV (comma separated values) format, instead of pure txt ?

Then you can simply split by commas, like city=line.split(',')[2]

Hope it helps

Upvotes: 3

Related Questions