Reputation: 2704
Let's Say I have a data set, the head of which is as follows
https://gist.github.com/ahmadmustafaanis/9ba3b5ea25b46b2b87ab858dc57ec15d
Now I want to check if the link in df['Link'] contains 'edx' or 'coursera' in it, then name should also contain it.
I first have to see all the links, which contain 'edx' or 'coursera' in it. My Logic is
df['Link'][df['Link'].isnull()==False].apply(lambda a: True if 'coursera' in a else True if 'edx' in a else False)
which returns a boolean series of True and False, for the links containing Coursera or Edx in them.
Now if I want to use Boolean Indexing to access the whole data frame by encaging this code inside a df[mycode] or df.loc[mycode], it gives me error and warning.
df[df['Link'][df['Link'].isnull()==False].apply(lambda a: True if 'coursera' in a else True if 'edx' in a else False)]
The Warning is
<ipython-input-47-d903df486dc7>:1: UserWarning: Boolean Series key will be reindexed to match DataFrame index.
df[df['Link'][df['Link'].isnull()==False].apply(lambda a: True if 'coursera' in a else True if 'edx' in a else False)]
and the error message is
IndexingError: Unalignable boolean Series provided as indexer (index of the boolean Series and of the indexed object do not match).
Upvotes: 0
Views: 590
Reputation: 31146
Neither of your lines of code fail for me. Seems a hugely complicated way to be able to filter a dataframe. Just define a mask that have True
for rows you want then use loc[mask]
import requests
res = requests.get("https://gist.githubusercontent.com/ahmadmustafaanis/9ba3b5ea25b46b2b87ab858dc57ec15d/raw/53c5f357f2e9db0d37e420a9b18a60ac7a8bdfa6/test.csv")
df = pd.read_csv(io.StringIO(res.content.decode()))
df['Link'][df['Link'].isnull()==False].apply(lambda a: True if 'coursera' in a else True if 'edx' in a else False)
df[df['Link'][df['Link'].isnull()==False].apply(lambda a: True if 'coursera' in a else True if 'edx' in a else False)]
mask = df["Link"].str.contains("coursera") | df["Link"].str.contains("edx")
df.loc[mask]
Upvotes: 1