JK1993
JK1993

Reputation: 138

How to add another column to my dataframe, that is a count of my other column "tags"

I am very new to python and I am trying to count the number of tags within a string.

I've found people saying to count the comma's then + 1 which makes sense. what doesn't make sense is how to make this into a column that applies to every row.
My data frame is called data and is set like below:

product_id  sku       total_sold  tags           total_images 
grgeggre    rgerg     456         Up1_, Up2      5

I want it to look like the below:

product_id  sku       total_sold  tags           total_images  total tags
grgeggre    rgerg     456         Up1_, Up2      5             2

I've tried:

tgs = data['tags']
tgsc = tgs.count("," in data["tags"] + str(1))
print(tgsc)

which doesn't work, any ideas?

Upvotes: 1

Views: 35

Answers (1)

LeoE
LeoE

Reputation: 2083

I think a simple lambda function for apply should do the trick:

data["total_tags"] = data["tags"].apply(lambda x : len(x.split(',')))

Explanation: DataFrame.apply():

Apply a function along an axis of the DataFrame.
Objects passed to the function are Series objects whose index is either the DataFrame’s index (axis=0) or the DataFrame’s columns (axis=1). By default (result_type=None), the final return type is inferred from the return type of the applied function. Otherwise, it depends on the result_type argument.

See pandas documentation

So we apply a function (the lambda function) to each row of the dataframe of the column "tags".
The lambda function is an anonymous function in this case with x as "input arguments" and len(x.split(',')) as function body. So this funnction is applied to each row of the column "tags".
For split() see str.split documentation it splits the string at the defined delimiter into an array. The length of this array is the number of comma divided tags.

Hope this explanation helped

Upvotes: 1

Related Questions