Rajeev
Rajeev

Reputation: 46969

Text/Word counting in a file from python

Chat.txt

ID674 25/01/1986 Thank you for choosing Optimus prime. Please wait for an Optimus prime Representative to respond. You are currently number 0 in the queue. You should be connected to an agent in approximately 0 minutes.. You are now chatting with 'Tom' 0      <br/>
ID674 2gb Hi there! Welcome to Optus Web Chat 0/0/0 . How can I help you today?  1 
ID674 25-01-1986 I would like to change my bill plan from $0 with 0 expiry to something else $136. I find it very unuseful. Sam my phone no is 9838383821   2

In the text mentioned above is just an example of few lines in a file.My requirement is that all the dates for example 25/01/1986 or 0/0/0 should be replaced with "DATE123" .
Then :) should be replaced with "smileys123". Currencies i.e, $0 or $136 should be replaced with "Currency123"
'TOM' (usually agents name in single quotes) should be replaced with AGENT123
and many more.The output should be the number of occurrences of the string as shown

DATE123=2  smileys123=2 Currency123=6 AGENT123=5

I have this approach as of now please let me know about this ,

  class Replace:
     dateformat=DATE123
     smileys=smileys123
     currency=currency123

  count_dict={}

  function count_data(type,count):
     global count_dict
     if type in count_dict:
        count_dict[type]+=count
     else:
        count_dict = {type:count}


  f=open("chat.txt")
  while True:
     for line in f.readlines():
        print line,
        if ":)" in line:
           smileys = line.count(":)")
           count_data("smileys",smileys)
        elif "$number" in line :    #how to see whether it is currency or nor??
           currency=line.count("$number") //how can i do this
           count_data("currecny",currency)
        elif "1/2/3" in line :    #how to validate date format
           dateformat=line.count("dateformat") #how can i do this
           count_data("currency",currency)
        elif validate-agent-name in line:
           agent_name=line.count("agentname")  #How to do this get agentname in single quotes
           count_data("agent_name",agent_name)
     else:
        break
  f.close()

  for keys in count_dict:
     print keys,count_dict[keys]


  The following would be the ouput

  DATE123=2  smileys123=2 Currency123=6 AGENT123=5

Upvotes: 1

Views: 759

Answers (2)

George
George

Reputation: 4674

Currencies i.e, $0 or $136 should be replaced with "Currency123" and 'TOM' (usually agents name in single quotes) should be replaced with AGENT123 and many more

I think your class Repalce should be replaced by a dictionary, in that case you can do more (because it comes with methods) while writing less code. The dictionary can keep track of what is it you need to replace wtih, and offer you more options to dynamically make changes to your replacement need. And doing do, maybe your code will be cleaner and easier to understand? Shorter for sure as you have more replacement words.

Edit: You might want to keep your list of replacement word in a text file, and load them into your dictionary. Instead of just hard code your replacement words into a class. That I don't think is a good idea. Since you did said many more, then it make more sense to do so, less code to write (and cleaner!)

To comment... use

# Here is a comment

The style of your code isn't the best, read http://www.python.org/dev/peps/pep-0008/#pet-peeves, or even the whole chapter if you want to learn the better coding style.

Here is the regular expression to check if it is currency, the name 'Tom', and the date.

import re

while True:
    myString = input('Enter your string: ')

    isMoney = re.match('^\$[0-9]+(,[0-9]{3})*(\.[0-9]{2})?$', myString)
    isName = re.match('^\'+\w+\'$', myString)
    isDate = re.match('^[0-1][0-9]\/[0-3][0-9]\/[0-1][0-9]{3}$', myString)
    # or try '^[0-1]*?\/[0-9]*\/[0-9]*$ If you want 0/0/0 too...

    if isMoney:
        print('It is Money:', myString)
    elif isName:
        print('It is a Name:', myString)
    elif isDate:
        print('It is a Date:', myString)
    else:
        print('Not good.')

Sanple output:

Enter your string: $100
It is Money: $100
Enter your string: 100
Not good.
Enter your string: 'Tom'
It is a Name: 'Tom'
Enter your string: Tom
Not good.
Enter your string: 01/15/1989
It is a Date: 01/15/1989
Enter your string: 01151989
Not good.

You can replace the condition with one of these isSomething variable, it depends on what exactly need to be done. I suppose, I hope this can help. If you want to learn more about regular expression, check out "Regular Expression Primer", or Python's RE Page.

Upvotes: 1

alan
alan

Reputation: 4852

This doesn't do all the replacements you said you need. But here's a way to count things in your data, using regular expressions and a default dictionary. If you really want the string replacements, I'm sure you can figure that out:

lines = [
   "ID674 25/01/1986 Thank you for :) choosing Optimus prime. Please wait for an Optimus prime Representative to respond. You are currently number 0 in the queue. You should be connected to an agent in approximately 0 minutes.. You are now chatting with 'Tom' 0",
  "ID674 2gb Hi there! Welcome to Optus Web Chat 0/0/0 . $5.45 How can I help you today?  1",
  "ID674 25-01-1986 I would like to change my bill plan from $0 with 0 expiry to something else $136. I find it very unuseful. Sam my phone no is 9838383821   2'"
]

import re
from collections import defaultdict

p_smiley = re.compile(r':\)|:-\)')
p_currency = re.compile(r'\$[\d.]+')
p_date = re.compile(r'(\d{1,4}[/-]\d{1,4}[/-]\d{1,4})')

count_dict = defaultdict(int)

def count_data(type, count):
    global count_dict
    count_dict[type] += count

for line in lines:
    count_data('smiley', len(re.findall(p_smiley, line)))
    count_data('date', len(re.findall(p_date, line)))
    count_data('currency', len(re.findall(p_currency, line)))

Upvotes: 1

Related Questions