Katsiaryna Shkirych
Katsiaryna Shkirych

Reputation: 5

Comparison of DataFrame columns and adding two more columns to DataFrame, based on comparison in Python Pandas

I have a DataFrame like this:

 category    uid sales_1 sales_2
0    Grocery     1   XX   XX
1    Grocery     2   XX   ZZ
2    Sports      3   XX   ZZ
3    Grocery     4   ZZ   XX
4    Beauty      5   ZZ   ZZ
5    Beauty      6   ZZ   ZZ
6    Sports      7   ZZ   XX
7    Grocery     8   ZZ   XX
...

I need to compare sales_1 column with sales_2 column. The result of comparison would be reflected in 2 new columns first and second. If sales_1 == sales_2 then values in theese 2 new columns should be 'no changes' and 'OK'. If sales_1 != sales_2 the values should be 'changed' and 'gap'. In the end I would like to have a following DataFrame:

 category    uid sales_1 sales_2  first     second
0    Grocery     1   XX   XX    no changes  OK
1    Grocery     2   XX   ZZ    changed     gap
2    Sports      3   XX   ZZ    changed     gap
3    Grocery     4   ZZ   XX    changed     gap
4    Beauty      5   ZZ   ZZ    no changes  OK
5    Beauty      6   ZZ   ZZ    no changes  OK
6    Sports      7   ZZ   XX    changed     gap
7    Grocery     8   ZZ   XX    changed     gap
...

I would really appreciate any suggestion.

Upvotes: 0

Views: 51

Answers (3)

jlesueur
jlesueur

Reputation: 326

You can use the where() function from numpy:

df['first'] = np.where(df.sales_1 == df.sales_2, 'no changes', 'changed')
df['second'] = np.where(df.sales_1 == df.sales_2, 'OK', 'gap')

Upvotes: 1

Maxim Ivanov
Maxim Ivanov

Reputation: 448

You can first assign a default value to first and second columns and then apply filtering by the condition whether sales changed.


import pandas as pd

df = pd.DataFrame(
    {
        'category': ['Grocery', 'Sports', 'Beauty'],
        'sales_1': ['XX', 'ZZ', 'XX'],
        'sales_2': ['XX', 'XY', 'ZZ'],
    }
)

changed_sales = df['sales_1'] != df['sales_2']

df['first'] = 'no changes'
df.loc[changed_sales, 'first'] = 'changed'
df['second'] = 'OK'
df.loc[changed_sales, 'second'] = 'gap'

print(df)

Output

  category sales_1 sales_2       first second
0  Grocery      XX      XX  no changes     OK
1   Sports      ZZ      XY     changed    gap
2   Beauty      XX      ZZ     changed    gap

Upvotes: 1

djangoliv
djangoliv

Reputation: 1788

you can use list comprehension

df['first']= ["no changes" if s1 == s2 else "changed" for (s1, s2) in zip(df['sales_1'], df['sales_2']) ]
df['second'] = ["OK" if s1 == s2 else "gap" for (s1, s2) in zip(df['sales_1'], df['sales_2']) ]

Upvotes: 0

Related Questions