shadow.T
shadow.T

Reputation: 135

How can I output the file name with its word content in such format in python?

Say I have a file test.txt containing :

1:text1.txt
2:text2.txt

text1.txt contains:

I am a good person

text2.txt contains:

Bla bla

I would like to output :

I 1
Bla 2    
am 1    
bla 2    
good 1
a 1
person 1

As in I want to output the file index with each word in the file. I would post my code but it is so ugly and far from the solution. I'm new to python so please be nice. There is no specified order of the output, the sample output I mentioned is utterly random just to get you to have an idea of what I'm looking for.

This is my code

`with open("text.txt", "r") as f: text=f.readlines()

for line in text:
  splitted=line.split(":")

splitsplit=splitted[1].split("\n")
files=splitsplit[0]

splittedindicies=splitted[0].split("\n")
indicies=splittedindicies[0]

print indicies[0]
files_list=list(files)
files_l=files.split(" ")
for x in files_l:
    fileshandle=open(x,"r")
    read=fileshandle.readlines()

    for y in read:
        words=y.split(" ")
        words.sort()
        for j in words:
            print j `

My output is:

1 I am a good
person 2 Bla bla

Again, please be nice, I'm an R programmer first time dealing with python.

Upvotes: 1

Views: 59

Answers (2)

Aaditya Ura
Aaditya Ura

Reputation: 12669

You should try some regex recipe here :

As you comment out :

how can I store the output

Your output is in values of dict , you can do operation with them.

import re
track={}
pattern=r'(\d):?(\w+\.txt)'
with open('test.txt','r') as file_name:
    for line in file_name:
        match=re.finditer(pattern,line)
        for finding in match:

            with open(finding.group(2)) as file_name_2:
                for item in file_name_2:
                    track[int(finding.group(1))]=item.split()

for key,value in track.items():
    for item in value:
        print(key,item)

output:

1 I
1 am
1 a
1 good
1 person
2 Bla
2 bla

Upvotes: 1

timgeb
timgeb

Reputation: 78690

Since the order of the words does not matter, why don't you just process the files in the order they appear in test.txt? There are a couple errors in your code, the first one on line 3 where you overwrite the content of splitted. I'm also particularly confused by your call to sort.

Anyway, here's one way to do it.

>>> with open('test.txt') as filenames:
...      for line in filenames:
...          file_no, filename = line.strip().split(':')
...          with open(filename) as f:
...              for line in f:
...                  for word in line.split():
...                      print '{} {}'.format(word, file_no)
... 
I 1
am 1
a 1
good 1
person 1
Bla 2
bla 2

Upvotes: 1

Related Questions