Reputation: 18933
I have a dataframe that outputs the following table. Note that 'Search term' is the index.
Search term Impressions Clicks Cost Converted clicks
american brewing 286446 104862 8034.18 6831
american brewing supplies 165235 64764 3916.48 4106
brewing supplies 123598 8131 6941.87 278
wine bottles 272969 7438 4944.7 194
www americanbrewing com 2782 1163 227.17 120
home brewing 216138 3744 3468.24 110
wine making 147985 6602 5024.54 108
If the 'Search term' (index) CONTAINS 'american brewing'
or 'americanbrewing'
, apply the label 'Brand'
, else apply 'Non-brand'
to a column with the header Label
.
Search term Impressions Clicks Cost Converted clicks Label
american brewing 286446 104862 8034.18 6831 Brand
american brewing supplies 165235 64764 3916.48 4106 Brand
brewing supplies 123598 8131 6941.87 278 Non-brand
wine bottles 272969 7438 4944.7 194 Non-brand
www americanbrewing com 2782 1163 227.17 120 Brand
home brewing 216138 3744 3468.24 110 Non-brand
wine making 147985 6602 5024.54 108 Non-brand
I have seen many examples on StackOverflow that look like this:
df['Label'] = df[df['SomeColumn'].str.contains('american brewing|americanbrewing')]
But this doesn't work because my 'SomeColumn'
is the df.index
and when I try something like:
df['Label'] = df[df.index.str.contains('american brewing|americanbrewing')]
I get the error AttributeError: 'Index' object has no attribute 'str'
I also saw examples of using np.where
which looks promising but I still run into the same problem because 'Search term'
is not a column, it's the index
.
df['Label'] = np.where(df['Search term'].str.contains('american brewing|americanbrewing', 'Brand', 'Non-brand')
Here's my full code:
import pandas as pd
import numpy as np
brand_terms = ['american brewing', 'americanbrewing']
data = pd.read_csv(r'sqr.csv', encoding='cp1252')
df = pd.DataFrame(data)
df['Search term'] = df['Search term'].replace(r'[^\w&\' ]', '', regex=True)
df['Cost'] = df['Cost'].replace(r'[^\d\.]', '', regex=True).astype('float')
#print(df.dtypes)
grouped = df.groupby('Search term')
result = grouped[['Impressions', 'Clicks', 'Cost', 'Converted clicks']].sum()
result = result.sort(['Converted clicks','Cost'], ascending=False)
#This doesn't work
result['Label'] = result.where(result['Search term'].str.contains('|'.join(brand_terms), 'Brand', 'Non-brand'))
result.to_csv('sqr_aggregate.csv')
How do I output the Label
column in the result
dataframe
based on if the Search term
(index) contains any of several possible string values? Where True
, apply Brand
, Else, apply Non-brand
to the Label
column.
Upvotes: 0
Views: 2197
Reputation: 76917
If you don't want to reset your index, here's one way to do it.
You could convert the index
to Series
and apply transformations.
In [16]: np.where(pd.Series(df.index).str.contains('american brewing|americanbrewing'),
'Brand', 'Non-Brand')
Out[16]:
array(['Brand', 'Brand', 'Non-Brand', 'Non-Brand', 'Brand', 'Non-Brand',
'Non-Brand'],
dtype='|S9')
Upvotes: 3
Reputation: 36545
Try changing your code to use df.groupby('Search term', as_index = False)
.
Upvotes: 0