ProsperousHeart
ProsperousHeart

Reputation: 321

pandas.isin broken for all caps?

I found the isin function for pandas, but it looks like all caps doesn't show?

import pandas as pd
df = pd.read_json('{"Technology Group":{"0":"Cloud","1":"Cloud","2":"Cloud","3":"Collaboration","4":"Collaboration","5":"Collaboration","6":"Collaboration","7":"Collaboration","8":"Collaboration","9":"Core", "10": "Software"},"Technology":{"0":"AMP","1":"EWS","2":"Webex","3":"Telepresence","4":"Call Manager","5":"Contact Center","6":"MS Voice","7":"Apps","8":"PRIME  ","9":"Wirelees", "10": "Prime Infrastructure"}}')

+------------------+----------------------+
| Technology Group | Technology           |
+------------------+----------------------+
| Cloud            | AMP                  |
+------------------+----------------------+
| Cloud            | EWS                  |
+------------------+----------------------+
| Cloud            | Webex                |
+------------------+----------------------+
| Collaboration    | Telepresence         |
+------------------+----------------------+
| Collaboration    | Call Manager         |
+------------------+----------------------+
| Collaboration    | Contact Center       |
+------------------+----------------------+
| Collaboration    | MS Voice             |
+------------------+----------------------+
| Collaboration    | Apps                 |
+------------------+----------------------+
| Collaboration    | PRIME                |
+------------------+----------------------+
| Core             | Wirelees             |
+------------------+----------------------+
| Software         | Prime Infrastructure |
+------------------+----------------------+

tech_input2 = ['AMP', 'Call Manager', 'PRIME']
df = df[df['Technology'].isin(tech_input2)]

It will show the following table:

+------------------+--------------+
| Technology Group | Technology   |
+------------------+--------------+
| Cloud            | AMP          |
+------------------+--------------+
| Collaboration    | Call Manager |
+------------------+--------------+

... instead of:

+------------------+--------------+
| Technology Group | Technology   |
+------------------+--------------+
| Cloud            | AMP          |
+------------------+--------------+
| Collaboration    | Call Manager |
+------------------+--------------+
| Collaboration    | PRIME        |
+------------------+--------------+

Is this a bug? Or did I do something wrong? It's not technically a duplicate of the original last line in the table, but not sure how to decipher it. It seems to act more like contains than isin ...

Upvotes: 0

Views: 77

Answers (1)

CAppajigowda
CAppajigowda

Reputation: 458

This might be due to spaces. The strip() removes characters from both left and right based on the argument (a string specifying the set of characters to be removed).

import pandas as pd
df = pd.read_json('{"Technology Group": {"0":"Cloud","1":"Cloud", 
"2":"Cloud","3":"Collaboration", "4":"Collaboration" ,":"Collaboration", 
"6":"Collaboration", "7":"Collaboration","8":"Collaboration","9":"Core", 
"10": "Software"},"Technology":{"0":"AMP","1":"EWS","2":"Webex","3":"Telepresence",
"4":"Call Manager","5":"Contact Center","6":"MS Voice","7":"Apps","8":"PRIME  
","9":"Wirelees", "10": "Prime Infrastructure"}}')

df['Technology'] = df['Technology'].str.strip()
tech_input2 = ['AMP', 'Call Manager', 'PRIME']
df = df[df['Technology'].isin(tech_input2)]

Upvotes: 1

Related Questions