Saad
Saad

Reputation: 159

Blank quotes usage in dataframe

I am trying to combine OR | with df.loc to extract data. The code I have written extracts everything in the csv file. Here is the original csv file: https://drive.google.com/open?id=16eo29mF0pn_qNw-BGpZyVM9PBxv2aN1G

import pandas as pd

df = pd.read_csv("yelp_business.csv")
df = df.loc[(df['categories'].str.contains('chinese', case = False)) | (df['name'].str.contains('subway', case = False)) | (df['categories'].str.contains('', case = False)) | (df['address'].str.contains('', case = False))]

print df

It looks like the blank quotes '' are not working in str.contains or the OR | doesn't work in df.loc. Instead of just returning rows with chinese restaurants (which are 4171 in number) and the row with the restaurant name subway, it returns all the 174,568 rows.

EDITED

The output I want should be all the rows of category chinese and all the rows of name subway while taking into consideration that the address might not have any assigned value or is null.

import pandas as pd

df = pd.read_csv("yelp_business.csv")

cusine = 'chinese'
name = 'subway'
address #address has no assigned value or is NULL

df = df.loc[(df['categories'].str.contains(cusine, case = False)) |
            (df['name'].str.contains(name, case = False)) | 
            (df['address'].str.contains(address, case = False))]


print df

This code gives me an error NameError: name 'address' is not defined.

Upvotes: 0

Views: 57

Answers (2)

jezrael
jezrael

Reputation: 862511

I think here is possible chain conditions by | for categories column, for find empty string use ^""$ - it match start and end of string with quotes:

df = pd.read_csv("yelp_business.csv")

df1 = df.loc[(df['categories'].str.contains('chinese|^""$', case = False)) |
            (df['name'].str.contains('subway', case = False)) | 
            (df['address'].str.contains('^""$', case = False))]
print (len(df1))
11320

print (df1.head())

               business_id                     name neighborhood  \
9   TGWhGNusxyMaA4kQVBNeew  "Detailing Gone Mobile"          NaN   
53  4srfPk1s8nlm1YusyDUbjg              ***"Subway"    Southeast   
57  spDZkD6cp0JUUm6ghIWHzA              "Kitchen M"   Unionville   
63  r6Jw8oRCeumxu7Y1WRxT7A           "D&D Cleaning"          NaN   
88  YhV93k9uiMdr3FlV4FHjwA        "Caviness Studio"          NaN   

                          address       city state postal_code   latitude  \
9                           ***""  Henderson    NV       89014  36.055825   
53  "6889 S Eastern Ave, Ste 101"  Las Vegas    NV       89119  36.064652   
57            "8515 McCowan Road"    Markham    ON     L3P 5E5  43.867918   
63                          ***""     Urbana    IL       61802  40.110588   
88                          ***""    Phoenix    AZ       85001  33.449967   

     longitude  stars  review_count  is_open  \
9  -115.046350    5.0             7        1   
53 -115.118954    2.5             6        1   
57  -79.283687    3.0            80        1   
63  -88.207270    5.0             4        0   
88 -112.070223    5.0             4        1   

                                           categories  
9                           Automotive;Auto Detailing  
53                   Fast Food;Restaurants;Sandwiches  
57                             ***Restaurants;Chinese  
63         Home Cleaning;Home Services;Window Washing  
88  Marketing;Men's Clothing;Restaurants;Graphic D...  

EDIT: If need filter out empty and NaNs values:

df2 = df.loc[(df['categories'].str.contains('chinese', case = False)) |
            (df['name'].str.contains('subway', case = False)) & 
           ~((df['address'] == '""') | (df['categories'] == '""'))]

print (df2.head())
                business_id              name     neighborhood  \
53   4srfPk1s8nlm1YusyDUbjg          "Subway"        Southeast   
57   spDZkD6cp0JUUm6ghIWHzA       "Kitchen M"       Unionville   
96   dTWfATVrBfKj7Vdn0qWVWg  "Flavor Cuisine"      Scarborough   
126  WUiDaFQRZ8wKYGLvmjFjAw    "China Buffet"  University City   
145  vzx1WdVivFsaN4QYrez2rw          "Subway"              NaN   

                                 address       city state postal_code  \
53         "6889 S Eastern Ave, Ste 101"  Las Vegas    NV       89119   
57                   "8515 McCowan Road"    Markham    ON     L3P 5E5   
96                "8 Glen Watford Drive"    Toronto    ON     M1S 2C1   
126  "8630 University Executive Park Dr"  Charlotte    NC       28262   
145                   "5111 Boulder Hwy"  Las Vegas    NV       89122   

      latitude   longitude  stars  review_count  is_open  \
53   36.064652 -115.118954    2.5             6        1   
57   43.867918  -79.283687    3.0            80        1   
96   43.787061  -79.276166    3.0             6        1   
126  35.306173  -80.752672    3.5            76        1   
145  36.112895 -115.062353    3.0             3        1   

                                 categories  
53         Fast Food;Restaurants;Sandwiches  
57                      Restaurants;Chinese  
96           Restaurants;Chinese;Food Court  
126  Buffets;Restaurants;Sushi Bars;Chinese  
145        Sandwiches;Restaurants;Fast Food  

Upvotes: 2

JON
JON

Reputation: 1738

Find detail information about contains at https://pandas.pydata.org/pandas-docs/stable/generated/pandas.Series.str.contains.html

Upvotes: 0

Related Questions