Reputation: 241
I have a pandas df
that contains numbers of visitors and the paths they chose before completing the conversion goal. Each row represents the path and the numbers of visitors chose the path, for example, row1: 18 visitors visited '(entrance)' --> '/' --> '/ContactUS'/Default.aspx' before reaching the target goal
I'm only interested in the product page that a visitor was at last and I'm trying to create a dictionary that takes product name, such as 'VFB25AEH' as the key and # of visits as the value
Step1 Step2 Step3 Visits
/ContactUs/Default.aspx / (entrance) 18
/Products/GBR100L.aspx /Products/VFB25AEH.aspx /Products/RAD80L.aspx 9
/Products/VFB25AEH.aspx (entrance) (not set) 5
/Products/RAD80L.aspx (entrance) (not set) 4
The following is my code that loops through each column of each row, and save the first product page (step that contains '/Products/') and save the total number of visits in a dictionary
result = {}
for i, row in enumerate(df.values):
for c in row:
if 'products' in str(c).lower():
c = c.strip('.aspx').split('/')[2]
if c in result:
result[c]+= 1
result[c] = 1
Ideal result is - result['VFB25AEH'] = 5, result['RAD80L'] = 4, result['GBR100L']=9
but, it turns out that the values in result were all '1'. Can someone help point out the error here??
Upvotes: 0
Views: 1479
Reputation: 734
The last 3 lines of your code reset result[c]
back to 1 every iteration. Instead you need:
if c in result:
result[c] += 1
else:
result[c] = 1
You could alternatively use collections.defaultdict
import collections
result = collections.defaultdict(int)
for i, row in enumerate(df.values):
for c in row:
if 'products' in str(c).lower():
c = c.strip('.aspx').split('/')[2]
result[c] += 1
EDIT
Taking into account the requirement to sum up the number of visits, and take only the most recent product page visited:
import collections
result = collections.defaultdict(int)
for row in df.values:
for c in row:
if 'products' in str(c).lower():
c = c.strip('.aspx').split('/')[2]
# The number of visits is in the last entry in the row
result[c] += row[-1]
# We've found the most recent product page, so move on to the next row
break
You don't actually need the call to enumerate()
- you weren't using the index at all.
Upvotes: 1