Lelouch Lamperouge
Lelouch Lamperouge

Reputation: 8411

python file manipulation

I have a file with entries such as: 26 1 33 2 . . .

and another file with sentences in english

I have to write a script to print the 1st word in sentence number 26 and the 2nd word in sentence 33. How do I do it?

Upvotes: 0

Views: 1333

Answers (4)

inspectorG4dget
inspectorG4dget

Reputation: 113905

In the following code, I am assuming that sentences end with '. '. You can modify it easily to accommodate other sentence delimiters as well. Note that abbreviations will therefore be a source of bugs.

Also, I am going to assume that words are delimited by spaces.

sentences = []
queries = []
english = ""

for line in file2:
    english += line
while english:
    period = english.find('.')
    sentences += english[: period+1].split()
    english = english[period+1 :]
q=""
for line in file1:
    q += " " + line.strip()

q = q.split()
for i in range(0, len(q)-1, 2):
    sentence = q[i]
    word = q[i+1]
    queries.append((sentence, query))

for s, w in queries:
    print sentences[s-1][w-1]

I haven't tested this, so please let me know (preferably with the case that broke it) if it doesn't work and I will look into bugs

Hope this helps

Upvotes: 0

Jay Zhu
Jay Zhu

Reputation: 1672

The following code should do the task. With assumptions that files are not too large. You may have to do some modification to deal with edge cases (like double space, etc)

# Get numers from file
num = []
with open('1.txt') as file:
    num = file.readlines()

# Get text from file    
text = []
with open('2.txt') as file:
    text = file.readlines()

# Parse text into words list.
data = []
for line in text:                    # For each paragraoh in the text
    sentences = l.strip().split('.') # Split it into sentences
    words = []
    for sentence in sentences:       # For each sentence in the text
        words = sentence.split(' ')  # Split it into words list
        if len(words) > 0:
            data.append(words)

# get desired result
for i = range(0, len(num)/2):
     print data[num[i+1]][num[i]]

Upvotes: 2

Alex Martelli
Alex Martelli

Reputation: 881487

The big issue is that you have to decide what separates "sentences". For example, is a '.' the end of a sentence? Or maybe part of an abbreviation, e.g. the one I've just used?-) Secondarily, and less difficult, what separates "words", e.g., is "TCP/IP" one word, or two?

Once you have sharply defined these rules, you can easily read the file of text into a a list of "sentences" each of which is a list of "words". Then, you read the other file as a sequence of pairs of numbers, and use them as indices into the overall list and inside the sublist thus identified. But the problem of sentence and word separation is really the hard part.

Upvotes: 0

Eli Bendersky
Eli Bendersky

Reputation: 273366

Here's a general sketch:

  • Read the first file into a list (a numeric entry in each element)
  • Read the second file into a list (a sentence in each element)
  • Iterate over the entry list, for each number find the sentence and print its relevant word

Now, if you show some effort of how you tried to implement this in Python, you will probably get more help.

Upvotes: 1

Related Questions