rickystevens17
rickystevens17

Reputation: 31

Python reading file and analysing lines with substring

In Python, I'm reading a large file with many many lines. Each line contains a number and then a string such as:

[37273738] Hello world!
[83847273747] Hey my name is James!

And so on...

After I read the txt file and put it into a list, I was wondering how I would be able to extract the number and then sort that whole line of code based on the number?

file = open("info.txt","r")
myList = []

for line in file:
    line = line.split()
    myList.append(line)

What I would like to do:

since the number in message one falls between 37273700 and 38000000, I'll sort that (along with all other lines that follow that rule) into a separate list

Upvotes: 3

Views: 90

Answers (3)

TessellatingHeckler
TessellatingHeckler

Reputation: 28963

How about:

# ---
# Function which gets a number from a line like so:
#  - searches for the pattern: start_of_line, [, sequence of digits
#  - if that's not found (e.g. empty line) return 0
#  - if it is found, try to convert it to a number type
#  - return the number, or 0 if that conversion fails

def extract_number(line):
    import re
    search_result = re.findall('^\[(\d+)\]', line)
    if not search_result:
        num = 0
    else:
        try:
            num = int(search_result[0])
        except ValueError:
            num = 0

    return num

# ---

# Read all the lines into a list
with open("info.txt") as f:
    lines = f.readlines()

# Sort them using the number function above, and print them
lines = sorted(lines, key=extract_number)
print ''.join(lines)

It's more resilient in the case of lines without numbers, it's more adjustable if the numbers might appear in different places (e.g. spaces at the start of the line).

(Obligatory suggestion not to use file as a variable name because it's a builtin function name already, and that's confusing).


Now there's an extract_number() function, it's easier to filter:

lines2 = [L for L in lines if 37273700 < extract_number(L) < 38000000]
print ''.join(lines2)

Upvotes: 1

RockOnGom
RockOnGom

Reputation: 3961

Use tuple as key value:

for line in file:
    line = line.split()
    keyval = (line[0].replace('[','').replace(']',''),line[1:])
    print(keyval)
    myList.append(keyval)

Sort

my_sorted_list = sorted(myList, key=lambda line: line[0])

Upvotes: 1

DevLounge
DevLounge

Reputation: 8437

This does exactly what you need (for the sorting part)

my_sorted_list = sorted(my_list, key=lambda line: int(line[0][1:-2]))

Upvotes: 1

Related Questions