Izzat Z.
Izzat Z.

Reputation: 447

How to split out string occurrences to an individual txt file?

For example I've this txt files containing these lines,

chicken
chicken
duck
duck
duck
parrot
parrot
chicken
chicken
chicken

How can I read it line by line and split chicken (2 lines) to 1.txt, duck (3 lines) to 2.txt and parrot (2 lines) to 3.txt and the last chicken (3 lines) occurrence to a 4.txt?

I've figured out until here,

count = 0

with open("test.txt") as rl:
    for num, line in enumerate (rl, 1):
        s = list(line)
        if "chicken" in line:
            count += 1

            finaljoin = "".join(s)

            print(count)

            with open("chicken.txt", 'a+') as f:
                f.write(finaljoin)

But my solution above only grab the whole chicken (total 5) into one file. The actual plan was to grab the 1st two line to a txt file and the last two chicken line to another txt file. Because it is being split by another animals.

Upvotes: 0

Views: 60

Answers (3)

Anton vBR
Anton vBR

Reputation: 18906

You can do it like this:

from itertools import groupby

with open('test.txt') as f:
    data = f.read().split('\n')

for ind, (_, g) in enumerate(groupby(data),1):
    with open('{}.txt'.format(ind), 'w') as f:
        f.write('\n'.join(g))

Explanation:

You can read about Itertools groupby here: https://docs.python.org/2/library/itertools.html#itertools.groupby.

Groupby will return two elements, the key and the group. So if we want to loop through a groupby we would do something like this: for key, group in groupby(object): or for k, g in groupby(object):

Now in this case the keys will be chicken, duck, parrot, chicken and the groups will be ['chicken', 'chicken'] , ['duck','duck... ...]

However (now comes the part where I explain ind, (_, g)), to obtain an index as we loop we can use Python's enumerate function which will return an index and the iterator. Typically it looks like this: for index, item in enumerate(list): or for ind, i in enumerate(list).

Now let's say we want to combine enumerate and groupby. Then we could do it like this: for index, (key, group) in enumerate(groupby(object)): or more compact: for ind, (_, g) .... I use _ in this case (and this is Pythonic) to signal that I am not interested in the variable (the key in this case).

Upvotes: 1

Gabriel Ben Compte
Gabriel Ben Compte

Reputation: 909

You can try:

count = 0
with open("test.txt") as readFile:
    previous_line = ""
    archive_name = ""
    for line in readFile:
        if line != previous_line:
             previous_line = line
             count += 1
             archive_name = str(count)+".txt"
        with open(archive_name, 'a+') as f:  
            f.write(line)

That will save "chicken chicken" in 1.txt, "duck duck duck" in 2.txt, "parrot parrot" in 3.txt and "chicken chicken chicken" in 4.txt

Upvotes: 1

Prune
Prune

Reputation: 77837

Actually, you haven't figured it out. You have no splitting provision; all you've done is to search for "chicken", wherever it appears, and dump those reconstituted lines into a "chicken.txt" file. You've made no provision for any other animal, and there's no attempt at logic to find those breaks. Also, there's a lot of superfluous code in this, such as repeatedly opening your output file, and generating num, which is never used.

Draw out your basic logic on paper, if needed. The critical step that you're missing is to check the previous animal against the current one. This is something such as

previous = None
with open("test.txt") as zoo:
    for animal in zoo:
        if animal == previous:
            # Process same animal
        else:
            # Process new animal
        previous = animal   # remember animal for next iteration

Can you take it from there? for num, line in enumerate (rl, 1):

Upvotes: 0

Related Questions