Reputation: 138
I am very new to python and I am trying to count the number of tags within a string.
I've found people saying to count the comma's then + 1 which makes sense. what doesn't make sense is how to make this into a column that applies to every row.
My data frame is called data and is set like below:
product_id sku total_sold tags total_images
grgeggre rgerg 456 Up1_, Up2 5
I want it to look like the below:
product_id sku total_sold tags total_images total tags
grgeggre rgerg 456 Up1_, Up2 5 2
I've tried:
tgs = data['tags']
tgsc = tgs.count("," in data["tags"] + str(1))
print(tgsc)
which doesn't work, any ideas?
Upvotes: 1
Views: 35
Reputation: 2083
I think a simple lambda function for apply should do the trick:
data["total_tags"] = data["tags"].apply(lambda x : len(x.split(',')))
Explanation: DataFrame.apply():
Apply a function along an axis of the DataFrame.
Objects passed to the function are Series objects whose index is either the DataFrame’s index (axis=0) or the DataFrame’s columns (axis=1). By default (result_type=None), the final return type is inferred from the return type of the applied function. Otherwise, it depends on the result_type argument.
So we apply a function (the lambda function) to each row of the dataframe of the column "tags"
.
The lambda function is an anonymous function in this case with x
as "input arguments" and len(x.split(','))
as function body. So this funnction is applied to each row of the column "tags"
.
For split()
see str.split documentation it splits the string at the defined delimiter into an array. The length of this array is the number of comma divided tags.
Hope this explanation helped
Upvotes: 1