Harshavardhan Ramanna
Harshavardhan Ramanna

Reputation: 738

Bitwise Majority function between columns

I am trying to implement an efficient bitwise majority function between columns of a dataframe.

To make things simple, I am showing a transposed column below (columns are 0,1,2,3 and one particular row A).

         A      
      +-----+
   0  | 000 |
      +-----+
   1  | 111 |
      +-----+
   2  | 001 |
      +-----+
   3  | 001 |
      +-----+

      +-----+
Output| 001 |
      +-----+

The calculation is done by finding the most repeated bit value in each position. For example, the LSB values are [0,1,1,1] so the returned LSB is 1. Similarly the other two bits are calculated to be 0 and 0.

What is the best way to compute this majority function? Does the method to calculate the majority differ if the values are stored as integers?

Upvotes: 0

Views: 488

Answers (1)

WhtevrFloatsYourBoat
WhtevrFloatsYourBoat

Reputation: 26

Second edit: It is actually easier if you do not split the digits into a list, but to access the i-th character of a string via df.str.get():

df.T.apply(lambda row: ''.join([str(int(row.str.get(i).astype(int).mean() >= 0.5)) for i in range(3)]))

If you have your numbers as integers instead of strings, you just have to replace the method to extract the i-th digit:

n_digits = 3
df.T.apply(lambda row: ''.join([str(int(((row // 2**i) % 2).mean() >= 0.5)) for i in range(n_digits-1, -1, -1)]))

Old answer: Convert each entry to a list of integers, check if the mean is at least 0.5, and join the resulting list of Boolean values back to a string of zeros and ones.

df = pd.DataFrame([['000','111','001','001'],['111','111','101','001']], columns=['0','1','2','3'], index=['A','B'])

(df.T.apply(lambda row: 
           (row.apply(lambda x: pd.Series(list(x))).astype(int).mean() >= 0.5)
           .astype(int))
 .astype(str)
 .apply(lambda x: ''.join(x)))

Edit: Let's have a closer look at the code from the inside out: The variable x is the binary representation of a number as a string. It first gets transformed to a list of single characters, then to a Series of single characters, and then to a Series of integers:

x = '001'
print(list(x))
print(pd.Series(list(x)))
print(pd.Series(list(x)).astype(int))
>>>
['0', '0', '1']
0    0
1    0
2    1
dtype: object
0    0
1    0
2    1
dtype: int32

We use this transformation for a whole row (which is a column of df.T, remember that apply works on columns by default):

row = df.loc['A']
print(row.apply(lambda x: pd.Series(list(x))).astype(int))
>>>
   0  1  2
0  0  0  0
1  1  1  1
2  0  0  1
3  0  0  1

Next comes the majority function: The i-th digit should be 1 if at least 50% of the entries of a column are 1. We can check this by computing the mean of the i-th column and comparing it to 0.5:

print(df.T.apply(lambda row: row.apply(lambda x: pd.Series(list(x))).astype(int).mean() >=0.5))
>>>
       A     B
0  False  True
1  False  True
2   True  True

The rest of the code converts each column, which is basically a list of Boolean values, back to a list of integers, then to a list of strings, and finally to a single string, so [False, False, True] becomes [0, 0, 1], which becomes ['0', '0', '1'], which is joined to '001'.

Upvotes: 1

Related Questions