Jarad
Jarad

Reputation: 18933

Python Pandas DataFrame If Index Contains Any String Values, Apply Label, Else Apply Different Label

I have a dataframe that outputs the following table. Note that 'Search term' is the index.

Search term                 Impressions Clicks  Cost     Converted clicks
american brewing            286446      104862  8034.18  6831
american brewing supplies   165235      64764   3916.48  4106
brewing supplies            123598      8131    6941.87  278
wine bottles                272969      7438    4944.7   194
www americanbrewing com     2782        1163    227.17   120
home brewing                216138      3744    3468.24  110
wine making                 147985      6602    5024.54  108

If the 'Search term' (index) CONTAINS 'american brewing' or 'americanbrewing', apply the label 'Brand', else apply 'Non-brand' to a column with the header Label.

Search term                 Impressions Clicks  Cost     Converted clicks    Label
american brewing            286446      104862  8034.18  6831                Brand
american brewing supplies   165235      64764   3916.48  4106                Brand
brewing supplies            123598      8131    6941.87  278                 Non-brand
wine bottles                272969      7438    4944.7   194                 Non-brand
www americanbrewing com     2782        1163    227.17   120                 Brand
home brewing                216138      3744    3468.24  110                 Non-brand
wine making                 147985      6602    5024.54  108                 Non-brand

I have seen many examples on StackOverflow that look like this:

df['Label'] = df[df['SomeColumn'].str.contains('american brewing|americanbrewing')]

But this doesn't work because my 'SomeColumn' is the df.index and when I try something like:

df['Label'] = df[df.index.str.contains('american brewing|americanbrewing')]

I get the error AttributeError: 'Index' object has no attribute 'str'

I also saw examples of using np.where which looks promising but I still run into the same problem because 'Search term' is not a column, it's the index.

df['Label'] = np.where(df['Search term'].str.contains('american brewing|americanbrewing', 'Brand', 'Non-brand')

Here's my full code:

import pandas as pd
import numpy as np

brand_terms = ['american brewing', 'americanbrewing']

data = pd.read_csv(r'sqr.csv', encoding='cp1252')

df = pd.DataFrame(data)
df['Search term'] = df['Search term'].replace(r'[^\w&\' ]', '', regex=True)
df['Cost'] = df['Cost'].replace(r'[^\d\.]', '', regex=True).astype('float')
#print(df.dtypes)
grouped = df.groupby('Search term')
result = grouped[['Impressions', 'Clicks', 'Cost', 'Converted clicks']].sum()
result = result.sort(['Converted clicks','Cost'], ascending=False)

#This doesn't work
result['Label'] = result.where(result['Search term'].str.contains('|'.join(brand_terms), 'Brand', 'Non-brand'))

result.to_csv('sqr_aggregate.csv')

How do I output the Label column in the result dataframe based on if the Search term (index) contains any of several possible string values? Where True, apply Brand, Else, apply Non-brand to the Label column.

Upvotes: 0

Views: 2197

Answers (2)

Zero
Zero

Reputation: 76917

If you don't want to reset your index, here's one way to do it.

You could convert the index to Series and apply transformations.

In [16]: np.where(pd.Series(df.index).str.contains('american brewing|americanbrewing'),
                  'Brand', 'Non-Brand')
Out[16]:
array(['Brand', 'Brand', 'Non-Brand', 'Non-Brand', 'Brand', 'Non-Brand',
       'Non-Brand'],
      dtype='|S9')

Upvotes: 3

maxymoo
maxymoo

Reputation: 36545

Try changing your code to use df.groupby('Search term', as_index = False).

Upvotes: 0

Related Questions