Reputation: 31
In Python, I'm reading a large file with many many lines. Each line contains a number and then a string such as:
[37273738] Hello world!
[83847273747] Hey my name is James!
And so on...
After I read the txt file and put it into a list, I was wondering how I would be able to extract the number and then sort that whole line of code based on the number?
file = open("info.txt","r")
myList = []
for line in file:
line = line.split()
myList.append(line)
What I would like to do:
since the number in message one falls between 37273700 and 38000000, I'll sort that (along with all other lines that follow that rule) into a separate list
Upvotes: 3
Views: 90
Reputation: 28963
How about:
# ---
# Function which gets a number from a line like so:
# - searches for the pattern: start_of_line, [, sequence of digits
# - if that's not found (e.g. empty line) return 0
# - if it is found, try to convert it to a number type
# - return the number, or 0 if that conversion fails
def extract_number(line):
import re
search_result = re.findall('^\[(\d+)\]', line)
if not search_result:
num = 0
else:
try:
num = int(search_result[0])
except ValueError:
num = 0
return num
# ---
# Read all the lines into a list
with open("info.txt") as f:
lines = f.readlines()
# Sort them using the number function above, and print them
lines = sorted(lines, key=extract_number)
print ''.join(lines)
It's more resilient in the case of lines without numbers, it's more adjustable if the numbers might appear in different places (e.g. spaces at the start of the line).
(Obligatory suggestion not to use file
as a variable name because it's a builtin function name already, and that's confusing).
Now there's an extract_number()
function, it's easier to filter:
lines2 = [L for L in lines if 37273700 < extract_number(L) < 38000000]
print ''.join(lines2)
Upvotes: 1
Reputation: 3961
Use tuple as key value:
for line in file:
line = line.split()
keyval = (line[0].replace('[','').replace(']',''),line[1:])
print(keyval)
myList.append(keyval)
Sort
my_sorted_list = sorted(myList, key=lambda line: line[0])
Upvotes: 1
Reputation: 8437
This does exactly what you need (for the sorting part)
my_sorted_list = sorted(my_list, key=lambda line: int(line[0][1:-2]))
Upvotes: 1