The Unfun Cat
The Unfun Cat

Reputation: 31918

How do I "or" each column in a DataFrame with a vector?

Let's say I have the data below:

try:
    from StringIO import StringIO
except ImportError:
    from io import StringIO
import pandas as pd
from numpy import uint8

vector = pd.Series([1, 0, 0, 1])

df = pd.read_table(StringIO("""a    b    c
1   0   0
1   1   1
0   1   1
1   1   0"""), sep="\s+", dtype=uint8, header=0)

How do I "or" the vector with each column in the df?

I know I can make a partial function with "or" and my vector and apply it to the df, but this is probably unidiomatic and needlessly time-consuming. What is the pandas way?

Come to think of it, the idiomatic way is probably a lambda... Is there no binary operator for this, like dataframe.div(series)? (Binary DF operations)

I'd like dataframe.or(vector)...

Upvotes: 1

Views: 67

Answers (2)

Alex Riley
Alex Riley

Reputation: 176860

You could pass the DataFrame and the (column) vector directly to np.logical_or:

>>> np.logical_or(df, vector[:, None])
       a     b     c
0   True  True  True
1   True  True  True
2  False  True  True
3   True  True  True

Note that this returns a DataFrame of boolean values; you can cast back to a numeric datatype if you prefer.

Upvotes: 2

shx2
shx2

Reputation: 64318

You can take advantage of numpy's broadcasting, bitwise-or'ing the underlying numpy array (df.values) against the vector:

import numpy as np
new_values = df.values.astype(bool) | vector.values[:,np.newaxis].astype(bool)

This results with a numpy array, not a dataframe, but you can easily re-construct the dataframe:

new_df = pd.DataFrame(new_values, columns = df.columns)

Since this approach directly let's numpy do the computations, it is likely the fastest.

Upvotes: 1

Related Questions