user11094200
user11094200

Reputation:

How to not count punctuation between words

What is the best way to count variable of say an apostrophe counting with words such as "shouldn't" only.

For example "I shouldn't do that" counts once But " 'I will not do that' " counts zero

Basically how can i use counts to count apostrophes in words and not quotes.

I haven't been able to try much successfully. I can only use the basic for loop to count every apostrophe but can't narrow down specifically.

for sentence in split_sentences: 
        for w in sentence:
            for p in punctuation:
                if p == w:
                    if word in counts:
                        counts[p] += 1 
                    else:
                        counts[p] = 1

                else:
                    pass

With a given list of words, It should count only in words not around word. So "Shouldn't" will count but "'should'" will not.

Upvotes: 2

Views: 720

Answers (2)

wovano
wovano

Reputation: 5120

You can use the regular expression [a-zA-Z]'[a-zA-Z] to find all single quotes that are surrounded by letters.

The requirement for the hyphen isn't completely clear to me. If it has the same requirement (i.e. it only counts when surrounded by letters) than using the regular expression [a-zA-Z]['-][a-zA-Z] will do the trick: it will count quotes as well as hyphens.

If you should count all hyphens, then you could just use the str.count method (e.g. "test-string".count("-") returns 1).

Here is some example code, assuming the hyphens must also be counted only if they are surrounded by letters:

import re

TEST_SENTENCES = (
    "I shouldn't do that",
    "'I will not do that'",
    "Test-hyphen"
)

PATTERN = re.compile("[a-zA-Z]['-][a-zA-Z]")

for sentence in TEST_SENTENCES:
    print(len(PATTERN.findall(sentence)))

Output:

1
0
1

Upvotes: 0

Netwave
Netwave

Reputation: 42796

You can check if it is inside the word:

for sentence in split_sentences: 
        for w in sentence:
            for p in punctuation:
                if p in w and w[0] != p and w[-1] != p:
                    if word in counts:
                        counts[p] += 1 
                    else:
                        counts[p] = 1
                else:
                    pass

The important line is this if p in w and w[0] != p and w[-1] != p: We have 3 rules for it to count:

  • The puntuation p is in the word 2
  • The word w does not start (w[0]) by the punctuation p
  • The word w does not ends (w[-1]) by the punctuation p

A more pythonic way of doing such would be to use the str available methods, endswith and startswith:

...
if p in w and not w.startswith(p) and not w.endswith(p):
   ...

Upvotes: 3

Related Questions