Reputation: 66
I'm trying to build a Python program similar to the wordcounter.net (https://wordcounter.net/). I have an excel file with one column that has text to be analyzed. Using pandas and other functions, I created a single word frequency counter.
But now, I need to further modify to find patterns.
For example a text has " Happy face sad face mellow little baby sweet Happy face face mellow sad face mellow "
So here, it should be able to trace patterns such as Two word density
Pattern Count
"Happy face" 2
"sad face" 2
"face mellow" 3
....
Three word density
Pattern Count
"Happy face sad" 1
"face sad face" 1
....
I also tried :
for match in re.finditer(pattern, line):
But this again has to be done manually and I want it to automatically find the patterns.
Can anyone help on how to proceed for this ?
Upvotes: 0
Views: 59
Reputation: 7627
text = 'Happy face sad face mellow little baby sweet Happy face face mellow sad face mellow'
d = {}
for s in text.split():
d.setdefault(s, 0)
d[s] += 1
out = {}
for k, v in d.items():
out.setdefault(v, []).append(k)
for i in sorted(out.keys(), reverse=True):
print(f'{i} word density:')
print(f'\t{out[i]}')
Output
5 word density:
['face']
3 word density:
['mellow']
2 word density:
['Happy', 'sad']
1 word density:
['little', 'baby', 'sweet']
from collections import Counter
def freq(lst, n):
lstn = []
for i in range(len(lst) - (n - 1)):
lstn.append(" ".join([lst[i + x] for x in range(n)]))
out = Counter(lstn)
print(f'{n} word density:')
for k, v in out.items():
print(f'\t"{k}" {v}')
text = 'Happy face sad face mellow little baby sweet Happy face face mellow sad face mellow'
lst = text.split()
freq(lst, 2)
freq(lst, 3)
Output
2 word density:
"Happy face" 2
"face sad" 1
"sad face" 2
"face mellow" 3
"mellow little" 1
"little baby" 1
"baby sweet" 1
"sweet Happy" 1
"face face" 1
"mellow sad" 1
3 word density:
"Happy face sad" 1
"face sad face" 1
"sad face mellow" 2
"face mellow little" 1
"mellow little baby" 1
"little baby sweet" 1
"baby sweet Happy" 1
"sweet Happy face" 1
"Happy face face" 1
"face face mellow" 1
"face mellow sad" 1
"mellow sad face" 1
Upvotes: 2