Ariana Negrea
Ariana Negrea

Reputation: 11

Replacing values for large amounts of data [Python]

I am having this dataset of data:

product Marketplace product_type
1 200 X
2 300 A
2 400 A
2 200 A
3 500 A
3 400 A
3 300 B

The expected output should be:

product Marketplace product_type
1 200 X
2 300 A
2 400 A
2 200 A
3 500 B
3 400 B
3 300 B

Basically, I'm changing the product type values if they differ for the same product. I tried the following code, but it works extremely hard for large amounts of data. Is there anything I could do about this or do you have any suggestions? What I have tried:

mp_correspondence = {200:1, 
                     300:2,
                     400:3,
                     500:4,
                    }
df['ranking'] = df['Marketplace'].map(mp_correspondence)
df
product_list = set(df['product'])
for i in product_list:
    df_product_frame = df[df['product'] == i].copy()
    nr_rows = df_product_frame['product'].count()
    if nr_rows > 1:
        df['product_type'] = (df.assign(ranking=df['Marketplace'].map(mp_correspondence)) \
                         .sort_values('ranking').groupby('product')

Upvotes: 1

Views: 47

Answers (1)

Corralien
Corralien

Reputation: 120469

I don't fully understand your code but you can try the code below which gives the expected output.

Create a mapping between product and product_type columns by keeping the first product_type encountered.

mappings = df.drop_duplicates('product_type').set_index('product')['product_type']

df['product_type'] = df['product'].map(mappings)

Output:

>>> df
   product  Marketplace product_type
0        1          200            X
1        2          300            A
2        2          400            A
3        2          200            A
4        3          500            B
5        3          400            B
6        3          300            B

>>> mappings
product
1    X
2    A
3    B
Name: product_type, dtype: object

Upvotes: 1

Related Questions