SoulD82
SoulD82

Reputation: 173

How to extract certain data on different lines from file in python

enter image description here

The code for events was this and it successfully worked.But i tried to modify it to ectract others and it's not working, obviously it's not correct.

with open('GroupEvent/G0.txt') as f:
lines = f.readlines()
for i in range(0, len(lines)):
    if lines[i] == '\n':
        nlines = 0
    else:
        line = lines[i]
        entry=line.split()
        for x in entry:
            first_char=x
            EventToMatch = ('E')
            if first_char.startswith(EventToMatch) and nlines == 1 :
              Events.append(first_char)
              nlines = 2
              break
            elif nlines==2:
              Org.append(first_char)
              nlines= 3
              
            elif nlines == 3:
              Yes.append(first_char)
              nlines =4
              

            elif nlines == 4:
              No.append(first_char)
              nlines == 0
              
            else:
             break

okay so I have a file in which I have data like above, now the first line the id with E is the specific id of an event and on the second link it's the person id's who is organizing, while the 3rd line has the id of the person who accepted the invite and the fourth one is the one who rejected. The file has dozens of record like this separated by one empty line. How can I collect the data for organizer id, people who said yes and no? I easily captured the event id that's because it started with E and I got myself an array of event ids. Now I am having trouble extracting the others.

Upvotes: 0

Views: 133

Answers (2)

Brian Biddle
Brian Biddle

Reputation: 21

If all you want are lists of only a specific ID. One way I've used in the past was this:

#initialize lists of the id's you want

event_id = []
org_id = []
accept_id = []
reject_id = []

#open the file with your data

file = open("FILENAME.txt", "r")

#now read the file

content = file.read()

# now split your file by every blank line by using "\n" twice 
# just like when you want a blank space you hit return twice

split_content = content.split("\n\n")

# now what i found easiest for me was to first create a list of lists
# to seperate each section of information on a specific item group listed

mylist = [item.split("\n") for item in split_content]

#now to just append your lists you originaly made in the beginning with
# the content you want assosiated there

for e in mylist:
    event_id.append(e[0])
    org_id.append(e[1])
    accept_id.append(e[2])
    reject_id.append(e[3])

# now all your ID's are seperated to there respective lists
# you can also append them to seperate files if you would like with this

file_to_append = open("FILENAME.txt", "a+")
file.write(e[INDEX_OF_ELEMENT])

Upvotes: 2

m.i.cosacak
m.i.cosacak

Reputation: 730

I generally use a class if the file has a certain structure. For instance, as like in FastQ file. I put the following lines in input_file.txt and returns 5 lines. You can do whatever you want with it.

input_file.txt

E932 4 1240153200000 #id of an event
M48462 #id of organizer
M48462 #id of accepted invite
M65542 #id of rejected invite

E932 4 1240153200000
M48462
M48462
M65542

E932 4 1240153200000
M48462
M48462
M65542

E932 4 1240153200000
M48462
M48462
M65542

The class code to handle it:

class HandleFile:
    def __init__(self, filename):
        self.input = open(filename,"r") # assuming it is a textfile
        self.currentLine = 0
    def __iter__(self):
        return self
    def __next__(self):
        mylist = []
        for i in range(5): # as it is 5 lines for each
            line = self.input.readline()
            line = str(line)
            self.currentLine += 1
            if line:
                mylist.append(line.strip("\n"))
            else:
                mylist.append(None) # add None if it is end of file
        if mylist.count(None) == 5: # check if it is the end of line
            raise StopIteration
        assert mylist[4] == "" # check if the 5th line is empty line
        assert mylist[0].startswith("E") # or put more condition
        return mylist

hf = HandleFile("input_file.txt")
for lst in hf:
    print(lst)

Here is the output:

...
['E932 4 1240153200000 #id of an event', 'M48462 #id of organizer', 'M48462 #id of accepted invite', 'M65542 #id of rejected invite', '']
['E932 4 1240153200000', 'M48462', 'M48462', 'M65542', '']
['E932 4 1240153200000', 'M48462', 'M48462', 'M65542', '']
['E932 4 1240153200000', 'M48462', 'M48462', 'M65542', '']
>>>

NOTE:this code has been modified from here

Upvotes: 2

Related Questions