Gabriel
Gabriel

Reputation: 42459

Select rows (with multiple strings) in pandas dataframe that contain only a given string

I'm manipulating the 2017 developer survey results. I want to isolate those rows which contain only the string Python in the HaveWorkedLanguage column.

This what that df['HaveWorkedLanguage'] column looks like:

0                                                 Swift
1                         JavaScript; Python; Ruby; SQL
2                                     Java; PHP; Python
3                                        Python; R; SQL
4                                                   NaN
5                                 JavaScript; PHP; Rust
6                                        Matlab; Python
7        CoffeeScript; Clojure; Elixir; Erlang; Haskell
8                                        C#; JavaScript
9                                    Objective-C; Swift
10                                               R; SQL
11                                                  NaN
12                                         C; C++; Java
13                          Java; JavaScript; Ruby; SQL
14                                     Assembly; C; C++
15                                   JavaScript; VB.NET
16                                           JavaScript
17                     Python; Matlab; Rust; SQL; Swift
18                                               Python
19                                         Perl; Python
20                                                  NaN
21                                  C#; JavaScript; SQL
22                                                 Java
23                                          Python; SQL
24                                                  NaN
25                                          Java; Scala
26         Java; JavaScript; Objective-C; Python; Swift
27                                                  NaN
28                                               Python
29                                                  NaN
...

I tried using pandas.Series.str.match which should:

Determine if each string matches a regular expression.

as shown here

import pandas as pd
df = pd.read_csv("survey_results_public.csv")
rows_w_Python = df[df['HaveWorkedLanguage'].str.match("Python", na=False)]['HaveWorkedLanguage']

The problem is that this selects those rows containing Python as a first entry, not those containing only Python, which resulsts in:

3                                        Python; R; SQL
17                     Python; Matlab; Rust; SQL; Swift
18                                               Python
23                                          Python; SQL
28                                               Python
...

How can I keep the rows that contain only Python?

Upvotes: 1

Views: 1809

Answers (1)

user2285236
user2285236

Reputation:

For exact matching, == operator should suffice. It doesn't require regex.

df['HaveWorkedLanguage'] == 'Python' returns a boolean filter where the value is exactly 'Python'.

Passing this filter to the DataFrame yields:

df[df['HaveWorkedLanguage'] == 'Python']
Out: 
   HaveWorkedLanguage
18             Python
28             Python

Upvotes: 2

Related Questions