user8315735
user8315735

Reputation:

Separate lines in Python

I have a .txt file. It has 3 different columns. The first one is just numbers. The second one is numbers which starts with 0 and it goes until 7. The final one is a sentence like. And I want to keep them in different lists because of matching them for their numbers. I want to write a function. How can I separate them in different lists without disrupting them?

The example of .txt:

1234    0    my name is
6789    2    I am coming
2346    1    are you new?
1234    2    Who are you?
1234    1    how's going on?

And I have keep them like this:

----1----   

1234    0    my name is 

1234    1    how's going on? 

1234    2    Who are you?

----2----   

2346    1    are you new?


----3-----   

6789    2    I am coming

What I've tried so far:

inputfile=open('input.txt','r').read()

m_id=[] 
p_id=[] 
packet_mes=[]

input_file=inputfile.split(" ")

print(input_file)

input_file=line.split() 
m_id=[int(x) for x in input_file if x.isdigit()] 
p_id=[x for x in input_file if not x.isdigit()]

Upvotes: 0

Views: 87

Answers (3)

kriss
kriss

Reputation: 24157

Maybe you want something like that:

import re

# Collect data from inpu file
h = {}
with open('input.txt', 'r') as f:
    for line in f:
        res = re.match("^(\d+)\s+(\d+)\s+(.*)$", line)
        if res:
            if not res.group(1) in h:
                h[res.group(1)] = []
            h[res.group(1)].append((res.group(2), res.group(3)))

# Output result
for i, x in enumerate(sorted(h.keys())):
    print("-------- %s -----------" % (i+1))
    for y in sorted(h[x]):
        print("%s %s %s" % (x, y[0], y[1]))

The result is as follow (add more newlines if you like):

-------- 1 -----------
1234 0 my name is
1234 1 how's going on?
1234 2 Who are you?
-------- 2 -----------
2346 1 are you new?
-------- 3 -----------
6789 2 I am coming

It's based on regexes (module re in python). This is a good tool when you want to match simple line based patterns.

Here it relies on spaces as columns separators but it can as easily be adapted for fixed width columns.

The results is collected in a dictionary of lists. each list containing tuples (pairs) of position and text.

The program waits output for sorting items.

Upvotes: 1

Ender Look
Ender Look

Reputation: 2391

It's a quite ugly code but it's quite easy to understand.

raw = []
with open("input.txt", "r") as file:
    for x in file:
        raw.append(x.strip().split(None, 2))
raw = sorted(raw)

title = raw[0][0]
refined = []
cluster = []
for x in raw:
    if x[0] == title:
        cluster.append(x)
    else:
        refined.append(cluster)
        cluster = []
        title = x[0]
        cluster.append(x)
refined.append(cluster)

for number, group in enumerate(refined):
    print("-"*10+str(number)+"-"*10)
    for line in group:
        print(*line)

Upvotes: 0

cs95
cs95

Reputation: 402263

With your current approach, you are reading the entire file as a string, and performing a split on a whitespace (you'd much rather split on newlines instead, because each line is separated by a newline). Furthermore, you're not segregating your data into disparate columns properly.


You have 3 columns. You can split each line into 3 parts using str.split(None, 2). The None implies splitting on space. Each group will be stored as key-list pairs inside a dictionary. Here I use an OrderedDict in case you need to maintain order, but you can just as easily declare o = {} as a normal dictionary with the same grouping (but no order!).

from collections import OrderedDict

o = OrderedDict()
with open('input.txt', 'r') as f:
    for line in f:
         i, j, k = line.strip().split(None, 2)
         o.setdefault(i, []).append([int(i), int(j), k])

print(dict(o))

{'1234': [[1234, 0, 'my name is'],
          [1234, 2, 'Who are you?'],
          [1234, 1, "how's going on?"]],
 '6789': [[6789, 2, 'I am coming']],
 '2346': [[2346, 1, 'are you new?']]}

Always use the with...as context manager when working with file I/O - it makes for clean code. Also, note that for larger files, iterating over each line is more memory efficient.

Upvotes: 2

Related Questions