giulio lo verde
giulio lo verde

Reputation: 1

Anomalous Relationship Between Margin and Price in Transaction Data

I’m working with a dataset for a project. Unfortunately, I don’t have access to detailed information about the variables (apart from the descriptions I'll provide below). After cleaning the data a bit (it was a real mess), I’m left with this dataset, which I’ll link here (along with a screenshot). One of the variables is called "margin," defined as "cumulative customer margin." This makes me think the variable should be in absolute value. Then, I have two other variables: "price" and "number of transactions." When filtering for number of transactions = 1, I’d expect the values of price to always be higher than margin (assuming margin = price - cost). However, I’ve found many anomalous values. I’m attaching a few examples in the screenshot. Any insights would be greatly appreciated! Here is the translation of the table:

Variable Data Challenge Description
EVENT_ID Transaction ID
N_ITEMS Total number of items purchased in the transaction
PROP_CONBINI Proportion of "conbini" articles in the transaction
FAV_GENRE Preferred manga genre
PHONE_NUMBER Customer's phone number available
MAIL Customer's e-mail address
YEAR Transaction year
MONTH Transaction month
PAYMENT_TYPE Agreed payment method
BOOKS_PAID Number of manga paid for in previous transactions
PRICE Transaction price
N_SUBSCRIPTIONS Number of active manga series subscriptions
SUBSCR_CANC Number of canceled manga series subscriptions in the past
POINT_OF_SALE Point of sale
AGE Customer's age
DAYS_FROM_PROMO Days since the last promo launch
MARGIN Cumulative customer margin
N_TRANSACTIONS Total number of transactions made by the customer
CUSTOMER_SINCE Date of the customer's first transaction
DATE_LAST_PURCHASE Date of the customer's last transaction
PAID Amount paid (target variable)

screenshot

Does anyone have any ideas? Am I missing something about the "margin" variable? I’ve also considered the possibility that it represents a relative value, but when I check the values of margin, there are many instances greater than 100, which doesn’t seem possible. i didn't find any significant pattern with the other variables This variable is crucial for me because I need to infer the average cost, which I plan to use as a weight for false negatives (I’m building a classification model for credit scoring, where 1 = pays, 0 = doesn’t pay). Any suggestions or insights would be incredibly helpful!

Upvotes: 0

Views: 16

Answers (0)

Related Questions