Reputation: 123
Given the text file
sample.txt
2012-01-01 09:00 San Diego Men's Clothing 214.05 Amex
2012-01-01 09:00 San Diego Women's Clothing 153.57 Visa
2012-01-01 09:00 Omaha Music 66.08 Cash
I want to be able to read only the text for the third column. This code
for line in open("sample.txt"):
city=line.split()[2]
print(city)
can read the third column to a certain degree:
San
San
Omaha
but what I want is:
San Diego
San Diego
Omaha
How do I do this?
Upvotes: 2
Views: 1044
Reputation: 2991
It appears your input file has fixed width fields. You might be able to achieve your goal using indexing in this case, e.g.
>>> for line in open('test.txt'):
... print(line[20:32])
...
San Diego
San Diego
Omaha
You could add a .strip()
to trim off trailing spaces if you need that for further processing etc.
Upvotes: 1
Reputation: 258
You will need to preprocess your input file by adding a delimeter which you will specify in your split()
function. Like this:
2012-01-01, 09:00, San Diego, Men's Clothing, 214.05, Amex
2012-01-01, 09:00, San Diego, Women's Clothing, 153.57, Visa
2012-01-01, 09:00, Omaha, Music, 66.08, Cash
Then
for line in open("sample.txt"):
city=line.split(",")[2]
print(city)
Upvotes: -1
Reputation: 26335
Since your items in sample.txt
are mostly separated by 2 spaces, you need to use split(' ')
instead. If you use split()
, this will by default split every whitespace, such as turning "Men's Clothing"
into ["Men's", "Clothing"]
, Which is not what you want.
First thing you can do is view your items with:
with open('sample.txt') as in_file:
for line in in_file.readlines():
items = [x.strip() for x in line.strip().split(' ') if x]
print(items)
Which outputs:
['2012-01-01', '09:00', 'San Diego', "Men's Clothing", '214.05', 'Amex']
['2012-01-01', '09:00', 'San Diego', "Women's Clothing", '153.57', 'Visa']
['2012-01-01', '09:00', 'Omaha', 'Music', '66.08', 'Cash']
Now if you want to extract the third column:
print(items[2])
Which gives:
San Diego
San Diego
Omaha
Upvotes: 0
Reputation: 174
Your text file delimits with at least two spaces, so specifying to split on two spaces and stripping away the remaining spaces on the ends with strip() works.
with open('sample.txt', 'r') as file_handle:
for line in file_handle:
city=line.split(' ')[2].strip()
print(city)
yields:
San Diego
San Diego
Omaha
Upvotes: 0
Reputation: 386
It does look like your file is separated by tabs (or \t).
Have you tried splitting it by tabs ?
Instead of city=line.split()[2]
try city=line.split('\t')[2]
.
Anyways, it looks like this file has been generated by an excel or similar, have you tried exporting it to a CSV (comma separated values) format, instead of pure txt ?
Then you can simply split by commas, like city=line.split(',')[2]
Hope it helps
Upvotes: 3