sort values and create new column based on rows Pandas

Question

I have a DataFrame from Pandas:

df = pd.DataFrame({'Job': ['Math', 'Math', 'Math', 'Math', 'Physics', 'Physics', 'Physics',
                             'English', 'English', 'English'],
                    'First_Name': ['William', 'James', 'Harper', 'William', 'Mason', 'Evelyn', 'Jacob',
                             'Eve', 'Ana', 'Theo'],
                    'Building': ['A1', 'A2', 'A3', 'A1', 'A2',  'A3', 'A1', 'A3', 'A1', 'A2'],
                    'Years_employed': [1, 2, 4, 6, 1, 4, 2, 3, 3, 2]},
                    columns=['Job', 'First_Name', 'Building', 'Years_employed'])

print(df)

What i have tried :

dfs = df.sort_values(['Building', 'Years_employed'])
dfs['answer'] = dfs['Job'].shift(-1)
dfs.loc[:, "answer"] = dfs.Job == dfs.answer

(It's not working because idk how to specify that the row(N) and the row(N-1) should have the same building)

How can I find out if, by building, the new employee has the same job as the person employed just before her in the same building?

IoaTzimas · Accepted Answer

You can use np.where with shift:

dfs = df.sort_values(['Building', 'Years_employed'])    

dfs['result'] = np.where(((dfs.Job == dfs.Job.shift(1)) & (dfs.Building==dfs.Building.shift(1))), True, False)

Example (I used another dataframe as your current one produced no True value):

df = pd.DataFrame({'Job': ['Math', 'Math', 'Math', 'Math', 'Physics', 'Physics', 'Physics',
                             'English', 'English', 'English'],
                    'First_Name': ['William', 'James', 'Harper', 'William', 'Mason', 'Evelyn', 'Jacob',
                             'Eve', 'Ana', 'Theo'],
                    'Building': ['A1', 'A1', 'A1', 'A1', 'A2',  'A3', 'A1', 'A3', 'A1', 'A2'],
                    'Years_employed': [1, 2, 4, 6, 1, 4, 2, 3, 3, 2]},
                    columns=['Job', 'First_Name', 'Building', 'Years_employed'])

Output:

       Job First_Name Building  Years_employed  result
0  Math     William    A1       1               False
1  Math     James      A1       2               True
6  Physics  Jacob      A1       2               False
8  English  Ana        A1       3               False
2  Math     Harper     A1       4               False
3  Math     William    A1       6               True
4  Physics  Mason      A2       1               False
9  English  Theo       A2       2               False
7  English  Eve        A3       3               False
5  Physics  Evelyn     A3       4               False

sort values and create new column based on rows Pandas

Answers (1)

Related Questions