Ahmad Anis
Ahmad Anis

Reputation: 2704

IndexingError: Unalignable boolean Series provided as indexer

Let's Say I have a data set, the head of which is as follows

https://gist.github.com/ahmadmustafaanis/9ba3b5ea25b46b2b87ab858dc57ec15d

Now I want to check if the link in df['Link'] contains 'edx' or 'coursera' in it, then name should also contain it.

I first have to see all the links, which contain 'edx' or 'coursera' in it. My Logic is

df['Link'][df['Link'].isnull()==False].apply(lambda a: True if 'coursera' in a else True if 'edx' in a else False)

which returns a boolean series of True and False, for the links containing Coursera or Edx in them.

Now if I want to use Boolean Indexing to access the whole data frame by encaging this code inside a df[mycode] or df.loc[mycode], it gives me error and warning.

df[df['Link'][df['Link'].isnull()==False].apply(lambda a: True if 'coursera' in a else True if 'edx' in a else False)]

The Warning is

<ipython-input-47-d903df486dc7>:1: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
  df[df['Link'][df['Link'].isnull()==False].apply(lambda a: True if 'coursera' in a else True if 'edx' in a else False)]

and the error message is

IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).

Upvotes: 0

Views: 590

Answers (1)

Rob Raymond
Rob Raymond

Reputation: 31146

Neither of your lines of code fail for me. Seems a hugely complicated way to be able to filter a dataframe. Just define a mask that have True for rows you want then use loc[mask]

import requests
res = requests.get("https://gist.githubusercontent.com/ahmadmustafaanis/9ba3b5ea25b46b2b87ab858dc57ec15d/raw/53c5f357f2e9db0d37e420a9b18a60ac7a8bdfa6/test.csv")
df = pd.read_csv(io.StringIO(res.content.decode()))

df['Link'][df['Link'].isnull()==False].apply(lambda a: True if 'coursera' in a else True if 'edx' in a else False)
df[df['Link'][df['Link'].isnull()==False].apply(lambda a: True if 'coursera' in a else True if 'edx' in a else False)]

mask = df["Link"].str.contains("coursera") | df["Link"].str.contains("edx")
df.loc[mask]

Upvotes: 1

Related Questions