Vishal Stark
Vishal Stark

Reputation: 95

Understanding a complex one line code - Big Mart Sales Data Set Analysis

I have been trying to learn to analyze Big Mart Sales Data Set from this website. I am unable to decode a line of code which is little bit complex. I tried to understand demystify it but I wasn't able to. Kindly help me understand this line at

In [26]

df['Item_Visibility_MeanRatio'] = df.apply(lambda x: x['Item_Visibility']/visibility_item_avg['Item_Visibility'][visibility_item_avg.index == x['Item_Identifier']][0],axis=1).astype(float)

Thankyou very much in advance. Happy coding

Upvotes: 1

Views: 125

Answers (2)

Cicilio
Cicilio

Reputation: 432

Lets go thorough it step by step:

df['Item_Visibility_MeanRatio']

This part is creating a column in the data frame and its name is Item_Visibility_MeanRatio.

df.apply(lambda...)

Apply a function along an axis of the Data frame.

x['Item_Visibility']

It is getting the data from Item_Visibility column in the data frame.

visibility_item_avg['Item_Visibility'][visibility_item_avg.index == x['Item_Identifier']][0]

This part finds the indexes that visibility_item_avg index is equal to df['Item_Identifier'].This will lead to a list. Then it will get the elements in visibility_item_avg['Item_Visibility'] that its index is equal to what was found in the previous part. [0] at the end is to find the first element of the outcome array.

axis=1

1 : apply function to each row.

astype(float)

This is for changing the value types to float. To make the code easy to grab, you can always split it to separate parts and digest it little by little.

To make the code faster you can do Vectorization instead of applying lambda. Refer to the link here.

Upvotes: 1

rhug123
rhug123

Reputation: 8768

df['Item_Visibility_MeanRatio'] 

This is the new column name

= df.apply(lambda x: 

applying a function to the dataframe

x['Item_Visibility'] 

take the Item_Visibility column from the original dataframe

/visibility_item_avg['Item_Visibility'][visibility_item_avg.index == x['Item_Identifier']][0] 

divide where the Item_Visibility column in the new pivot table where the Item_Identifier is equal to the Item_Identifier in the original dataframe

,axis=1) 

apply along the columns (horizontally)

.astype(float) 

convert to float type

Also, it looks like .apply is used a lot on the link you attached. It should be noted that apply is generally the slow way to do things, and there are usually alternatives to avoid using apply.

Upvotes: 1

Related Questions