pythonpython-3.xlistalgorithmdata-structures

Reputation: 15581

Number of occurrences of digit in numbers from 0 to n

Given a number n, count number of occurrences of digits 0, 2 and 4 including n.

Example1:

n = 10
output: 4

Example2:

n = 22
output: 11

My Code:

n = 22

def count_digit(n):
    count = 0
    for i in range(n+1):
        if '2' in str(i):
            count += 1
        if '0' in str(i):
            count += 1
        if '4' in str(i):
            count += 1
    return count

count_digit(n)

Code Output: 10

Desired Output: 11

Constraints: 1 <= N <= 10^5

Note: The solution should not cause outOfMemoryException or Time Limit Exceeded for large numbers.

Upvotes: 3

Answers (6)

cards

Reputation: 4975

Using single branch conditional

def count_digit(n):
    s = '024'
    out = 0
    for integer in map(str, range(n+1)): # integer as string
        for digit in integer:
            if digit in s:
                out += 1
    return out

or more compactly

def count_digit(n):
    s = '024'
    return sum(1 for i in map(str, range(n+1)) for d in i if d in s)

Upvotes: -1

גלעד ברקן

Reputation: 23955

I ended up with a similar answer to rici's, except maybe from a slightly different phrasing for the numeric formulation. How many instances of each digit in each position ("counts for each column," as rici described) we can formulate in two parts as first p * floor(n / (10 * p)), where p is 10 raised to the power of position. For example, in position 0 (the rightmost), there is one 1 for each ten numbers. Counting the 0's, however, requires an additional check regarding the population of the current and next position.

To the first part we still need to add the counts attributed to the remainder of the division. For example, for n = 6, floor(6 / 10) = 0 but we do have one count of 2 and one of 4. We add p if the digit in that position in n is greater than the digit we're counting; or, if the digit is the same, we add the value on the right of the digit plus 1 (for example, for n = 45, we want to count the 6 instances where 4 appears in position 1: 40, 41, 42, 43, 44, 45).

JavaScript code, comparing with rici's instantly for all numbers from 1 to 600,000. (If I'm not mistaken, rici's code wrongly returns 0 for n = 0, when the answer should be 1 count.

function countd(m, s = [0,2,4]) {
  if (m <= 0)
    return 0
  m += 1
  rv = 0
  rest = 0
  pos = 1
  while (true) {
    digit = m % 10
    m = Math.floor(m / 10)
    rv += m * pos * s.length
    for (d of s) {
      if (digit > d)
        rv += pos
      else if (digit == d)
        rv += rest
    }
    if (m == 0) {
      break
    }
    rest += digit * pos
    pos *= 10
  }
  if (s.includes(0)) {
    rv -= Math.floor((10 * pos - 1) / 9) - 1
  }
  return rv
}

function f(n, ds = [0, 2, 4]) {
  // Value on the right of position
  let curr = 0;
  let m = n;
  // 10 to the power of position
  let p = 1;
  let result = 1;
  
  while (m) {
    const digit = m % 10;
    m = Math.floor(m / 10);
    for (const d of ds) {
      if (d != 0 || n >= 11 * p) {
        result += p * Math.floor((n - (d ? 0 : 10 * p)) / (10 * p));
      }
      if (digit > d && (d != 0 || m > 0)) {
        result += p;
      } else if (digit == d) {
        result += curr + 1;
      }
    }
    curr += p * digit;
    p *= 10;
  }
  
  return result;
}

for (let n = 1; n <= 600000; n += 1) {
  const _f = f(n);
  const _countd = countd(n);
  if (_f != _countd) {
    console.log(`n: ${ n }`);
    console.log(_f, _countd);
    break;
  }
}

console.log("Done.");

Upvotes: 1

rici

Reputation: 241701

TL;DR: If you do it right, you can compute the count about a thousand times faster for n close to 10**5, and since the better algorithm uses time proportional to the number of digits in n, it can easily handle even values of n too large for a 64-bit integer.

As is often the case with puzzles like this ("in the numbers from x to y, how many...?"), the key is to find a way to compute an aggregate count, ideally in O(1), for a large range. For combinatorics over the string representation of numbers, a convenient range is often something like the set of all numbers whose string representation is a given size, possibly with a specific prefix. In other words, ranges of the form [prefix*10⁴, prefix*10⁴+9999], where 0s in the lower limit is the same as the number of 9s in the upper limit and the exponent of 10 in the multiplier. (It's often actually more convenient to use half-open ranges, where the lower limit is inclusive and the upper limit is exclusive, so the above example would be [prefix*10⁴, (prefix+1)*10⁴).)

Also note that if the problem is to compute a count for [x, y), and you only know how to compute [0, y), then you just do two computations, because

count [x, y) == count [0, y) - count [0, x)

That identity is one of the simplifications which half-open intervals allow.

That would work nicely with this problem, because it's clear how many times a digit d occurs in the set of all k-digit suffixes for a given prefix. (In the 10^k suffixes, every digit has the same frequency as every other digit; there are a total of k×10^k digits in those 10^k, and since all digits have the same count, that count must be k×10^k−1.) Then you just have to add the digit count of the prefixes, but the prefix appears exactly 10^k times, and each one contributes the same count.

So you could take a number like 72483, and decompose it into the following ranges, which roughly correspond to the sum of the digits in 72483, plus a few ranges containing fewer digits.

[0, 9]
[10, 99]
[100, 999]
[1000, 9999]
[10000, 19999]
[20000, 29999]
[30000, 39999]
[40000, 49999]
[50000, 59999]
[60000, 69999]
[70000, 70999]
[71000, 71999]
[72000, 72099]
[72100, 72199]
[72200, 72299]
[72300, 72399]
[72400, 72409]
[72410, 72419]
[72420, 72429]
[72430, 72439]
[72440, 72449]
[72450, 72459]
[72460, 72469]
[72470, 72479]
[72480, 72480]
[72481, 72481]
[72482, 72482]
[72483, 72483]

However, in the following code, I used a slightly different algorithm, which turned out to be a bit shorter. It considers the rectangle in which all the mumbers from 0 to n are written out, including leading zeros, and then computes counts for each column. A column of digits in a rectangle of sequential integers follows a simple recurring pattern; the frequency can easily be computed by starting with the completely repetitive part of the column. After the complete repetitions, the remaining digits are in order, with each one except the last one appearing the same number of times. It's probably easiest to understand that by drawing out a small example on a pad of paper, but the following code should also be reasonably clear (I hope).

The one problem with that is that it counts leading zeros which don't actually exist, so it needs to be corrected by subtracting the leading zero count. Fortunately, that count is extremely easy to compute. If you consider a range ending with a five-digit number (which itself cannot start with a zero, since it wouldn't really be a five-digit number if it started with zero), then you can see that the range includes:

10000 numbers start with a zero
1000 more numbers which have a second leading zero
100 more numbers which have a third leading zero
10 more numbers which have a fourth leading zero No numbers have five leading zeros, because we write 0 as such, not as an empty string.

That adds up to 11110, and it's easy to see how that generalises. That value can be computed without a loop, as (10⁵ − 1) / 9 − 1. That correction is done at the end of the following function:

def countd(m, s=(0,2,4)):
    if m < 0: return 0
    m += 1
    rv = 0

    rest = 0
    pos = 1
    while True:
        digit = m % 10
        m //= 10
        rv += m * pos * len(s)
        for d in s:
            if digit > d:
                rv += pos
            elif digit == d:
                rv += rest
        if m == 0:
            break
        rest += digit * pos
        pos *= 10
    if 0 in s:
        rv -= (10 * pos - 1) // 9 - 1
    return rv

That code could almost certainly be tightened up; I was just trying to get the algorithm down. But, as it is, it's execution time is measured in microseconds, not milliseconds, even for much larger values of n.

Here's an update of Kelly's benchmark; I removed the other solutions because they were taking too long for the last value of n:

Try it online!

Upvotes: 3

islam abdelmoumen

Reputation: 664

There are numbers in which the desired number is repeated, such as 20 or 22, so instead of adding 1 you must add 2

>>> 
>>> string = ','.join(map(str,range(23)))
>>> 
>>> string
'0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22'
>>> 
>>> string.count('0') + string.count('2') + string.count('4')
11
>>> 



n = 22

def count_digit(n):
    count = 0
    for i in map(str,range(n+1)):

        count+=i.count('0')
        count+=i.count('2')
        count+=i.count('3')
    return count
print(count_digit(n))

that solotion is fast: It can be developed to be faster:

def count_digit(n):
    i=0
    count=0
    s='024'
    while i<n-1:
    
        j = 0
        for v in str(i):
            if v in s:
                j+=1

        count+=3*j + (7*(j-1))
        i+=10

    for i in range(i,n+1,1):
        for v in str(i):
            if v in s:
                count+=1


    return count

Upvotes: 2

Kelly Bundy

Reputation: 27588

Another brute force, seems faster:

def count_digit(n):
    s = str(list(range(n+1)))
    return sum(map(s.count, '024'))

Benchmark with n = 10**5:

result   time   solution

115474  244 ms  original
138895   51 ms  Kelly
138895  225 ms  islam_abdelmoumen
138895  356 ms  CodingDaveS

Code (Try it online!):

from timeit import default_timer as time

def original(n):
    count = 0
    for i in range(n+1):
        if '2' in str(i):
            count += 1
        if '0' in str(i):
            count += 1
        if '4' in str(i):
            count += 1
    return count

def Kelly(n):
    s = str(list(range(n+1)))
    return sum(map(s.count, '024'))

def islam_abdelmoumen(n):
    count = 0
    for i in map(str,range(n+1)):
        count+=i.count('0')
        count+=i.count('2')
        count+=i.count('3')
    return count

def CodingDaveS(n):
    count = 0
    for i in range(n + 1):
        if '2' in str(i):
            count += str(i).count('2')
        if '0' in str(i):
            count += str(i).count('0')
        if '4' in str(i):
            count += str(i).count('4')
    return count

funcs = original, Kelly, islam_abdelmoumen, CodingDaveS

print('result   time   solution')
print()
for _ in range(3):
    for f in funcs:
        t = time()
        print(f(10**5), ' %3d ms ' % ((time()-t)*1e3), f.__name__)
    print()

Upvotes: 0

CodingDaveS

Reputation: 29

You can increment your count like this:

def count_digit(n):
    count = 0
    for i in range(n + 1):
        if '2' in str(i):
            count += str(i).count('2')
        if '0' in str(i):
            count += str(i).count('0')
        if '4' in str(i):
            count += str(i).count('4')
    return count

In that way, edge cases like 22, 44, and so on are covered!

Upvotes: 2

Number of occurrences of digit in numbers from 0 to n

Answers (6)

Related Questions