user10772275
user10772275

Reputation:

get most frequent two consecutive numbers

import itertools, numpy as np

a = [1,2,3,4,5]
b = [5,2,3,6,7]
c = [5,2,3,8,9]

get most frequent numbers:

data = np.array([a,b,c]).flatten()
print (data)

values, counts = np.unique(data, return_counts=True)

for value, frequency in zip(values, counts):
    print (value, frequency)

How can I get most frequent two consecutive numbers? Answer is [2,3]. But how to get it by program?

Upvotes: 0

Views: 223

Answers (2)

hiro protagonist
hiro protagonist

Reputation: 46849

you could use collections.Counter and iterate over data in consecutive pairs:

import numpy as np
from collections import Counter

a = [1,2,3,4,5]
b = [5,2,3,6,7]
c = [5,2,3,8,9]

data = np.array([a,b,c]).flatten()

c = Counter(zip(data, data[1:]))
print(c.most_common(1))
# [((2, 3), 3)]

telling you that (2, 3) occurred 3 times.


a bit more detail:

data[1:]

is your data without its first element.

zip(data, data[1:])

zip is then used to generate the consecutive pairs (as tuples)

(1, 2), (2, 3), (3, 4), (4, 5), (5, 5), (5, 2), (2, 3), ...

the Counter then just counts how many times the appear and stores them dict-like:

Counter({(2, 3): 3, (5, 2): 2, (1, 2): 1, (3, 4): 1, (4, 5): 1, (5, 5): 1, (3, 6): 1,
         (6, 7): 1, (7, 5): 1, (3, 8): 1, (8, 9): 1})

update: if you do not want pairs from different list, you can do this:

data = (a, b, c)

c = Counter()
for d in data:
    c.update(zip(d, d[1:]))
print(c)

or directly:

c = Counter(pair for d in data for pair in zip(d, d[1:]))

Upvotes: 3

akilat90
akilat90

Reputation: 5696

You can use Counter as suggested by @hiro protagonist, but since you want to treat a one row at a time, you have to apply it along rows.

from collections import Counter

Apply along rows using numpy:

data = np.array([a,b,c])

np.apply_along_axis(lambda x: Counter(zip(x, x[1:])), 1, data).sum().most_common(1)
[((2, 3), 3)]

Or, if using pandas:

import pandas as pd
data = np.array([a,b,c])
df = pd.DataFrame(data)

Now, apply Counter along rows:

df.apply(lambda x: Counter(zip(x, x[1:])), axis = 1).sum().most_common(1)

[((2, 3), 3)]

Upvotes: 0

Related Questions