Reputation: 23
I have a DataFrame
from Pandas:
df = pd.DataFrame({'Job': ['Math', 'Math', 'Math', 'Math', 'Physics', 'Physics', 'Physics',
'English', 'English', 'English'],
'First_Name': ['William', 'James', 'Harper', 'William', 'Mason', 'Evelyn', 'Jacob',
'Eve', 'Ana', 'Theo'],
'Building': ['A1', 'A2', 'A3', 'A1', 'A2', 'A3', 'A1', 'A3', 'A1', 'A2'],
'Years_employed': [1, 2, 4, 6, 1, 4, 2, 3, 3, 2]},
columns=['Job', 'First_Name', 'Building', 'Years_employed'])
print(df)
What i have tried :
dfs = df.sort_values(['Building', 'Years_employed'])
dfs['answer'] = dfs['Job'].shift(-1)
dfs.loc[:, "answer"] = dfs.Job == dfs.answer
(It's not working because idk how to specify that the row(N) and the row(N-1) should have the same building)
How can I find out if, by building, the new employee has the same job as the person employed just before her in the same building?
Upvotes: 2
Views: 85
Reputation: 10624
You can use np.where with shift:
dfs = df.sort_values(['Building', 'Years_employed'])
dfs['result'] = np.where(((dfs.Job == dfs.Job.shift(1)) & (dfs.Building==dfs.Building.shift(1))), True, False)
Example (I used another dataframe as your current one produced no True value):
df = pd.DataFrame({'Job': ['Math', 'Math', 'Math', 'Math', 'Physics', 'Physics', 'Physics',
'English', 'English', 'English'],
'First_Name': ['William', 'James', 'Harper', 'William', 'Mason', 'Evelyn', 'Jacob',
'Eve', 'Ana', 'Theo'],
'Building': ['A1', 'A1', 'A1', 'A1', 'A2', 'A3', 'A1', 'A3', 'A1', 'A2'],
'Years_employed': [1, 2, 4, 6, 1, 4, 2, 3, 3, 2]},
columns=['Job', 'First_Name', 'Building', 'Years_employed'])
Output:
Job First_Name Building Years_employed result
0 Math William A1 1 False
1 Math James A1 2 True
6 Physics Jacob A1 2 False
8 English Ana A1 3 False
2 Math Harper A1 4 False
3 Math William A1 6 True
4 Physics Mason A2 1 False
9 English Theo A2 2 False
7 English Eve A3 3 False
5 Physics Evelyn A3 4 False
Upvotes: 2