Reputation: 11
I am having this dataset of data:
product Marketplace product_type
1 200 X
2 300 A
2 400 A
2 200 A
3 500 A
3 400 A
3 300 B
The expected output should be:
product Marketplace product_type
1 200 X
2 300 A
2 400 A
2 200 A
3 500 B
3 400 B
3 300 B
Basically, I'm changing the product type values if they differ for the same product. I tried the following code, but it works extremely hard for large amounts of data. Is there anything I could do about this or do you have any suggestions? What I have tried:
mp_correspondence = {200:1,
300:2,
400:3,
500:4,
}
df['ranking'] = df['Marketplace'].map(mp_correspondence)
df
product_list = set(df['product'])
for i in product_list:
df_product_frame = df[df['product'] == i].copy()
nr_rows = df_product_frame['product'].count()
if nr_rows > 1:
df['product_type'] = (df.assign(ranking=df['Marketplace'].map(mp_correspondence)) \
.sort_values('ranking').groupby('product')
Upvotes: 1
Views: 47
Reputation: 120469
I don't fully understand your code but you can try the code below which gives the expected output.
Create a mapping between product
and product_type
columns by keeping the first product_type
encountered.
mappings = df.drop_duplicates('product_type').set_index('product')['product_type']
df['product_type'] = df['product'].map(mappings)
Output:
>>> df
product Marketplace product_type
0 1 200 X
1 2 300 A
2 2 400 A
3 2 200 A
4 3 500 B
5 3 400 B
6 3 300 B
>>> mappings
product
1 X
2 A
3 B
Name: product_type, dtype: object
Upvotes: 1