Reputation: 1052
I have a dataframe DF with 2 columns:
CLASS STUDENT
'Sci' 'Francy'
'Sci' Vacant
'math' 'Alex'
'math' 'Arthur'
'math' 'Katy'
'eng' 'Jack'
'eng' Vacant
'eng' 'Francy'
'Hist' 'Francy'
'Hist' 'Francy'
I need all the classes to have 1 vacant
student. Some of them already have.
RESULT
CLASS STUDENT
'Sci' 'Francy'
'Sci' Vacant
'math' 'Alex'
'math' 'Arthur'
'math' 'Katy'
'math' Vacant
'eng' 'Jack'
'eng' Vacant
'eng' 'Francy'
'Hist' 'Francy'
'Hist' 'Francy'
'Hist' Vacant
I have tried
unique_class = DF['unique_class'].drop_duplicates()
vacant_column = pd.Series(['vacant'] * unique_class.shape[0])
temp_df = pd.concat([unique_class, vacant_column], axis=1, ignore_index=True)
DF = DF.append(temp_df, ignore_index=True)
DF.drop_duplicates(inplace=True)
It works but it seems too much. Any better way to do this?
Upvotes: 1
Views: 68
Reputation: 18647
For the record, there is nothing wrong with your solution. You can achieve the same result in a "one-liner" using practically the same approach:
df = df.append(df[['CLASS']].drop_duplicates().assign(STUDENT='Vacant')).drop_duplicates()
[out]
CLASS STUDENT
0 Sci Francy
1 Sci Vacant
2 math Alex
3 math Arthur
4 math Katy
5 eng Jack
6 eng Vacant
7 eng Francy
8 Hist Francy
2 math Vacant
8 Hist Vacant
And if required, you can chain on sort_values
and reset_index
to make the table a bit more legible:
df = (df.append(df[['CLASS']].drop_duplicates().assign(STUDENT='Vacant'))
.drop_duplicates()
.sort_values('CLASS')
.reset_index(drop=True))
[out]
CLASS STUDENT
0 Hist Francy
1 Hist Vacant
2 Sci Francy
3 Sci Vacant
4 eng Jack
5 eng Vacant
6 eng Francy
7 math Alex
8 math Arthur
9 math Katy
10 math Vacant
Upvotes: 2
Reputation: 13401
Using pd.merge
df_new = pd.DataFrame({'CLASS': df['CLASS'].unique(), 'STUDENT':'vacant'})
df_new.merge(df, how='outer', on=['CLASS','STUDENT'])
# Use `.sort_values(by='CLASS') if sorted df needed
Output:
CLASS STUDENT
0 Sci vacant
1 math vacant
2 eng vacant
3 Hist vacant
4 Sci Francy
5 math Alex
6 math Arthur
7 math Katy
8 eng Jack
9 eng Francy
10 Hist Francy
11 Hist Francy
Upvotes: 2
Reputation: 1961
Here's another way to do it:
# Copy of your data
df = pd.DataFrame({
"class": ["Sci", "Sci", "math", "math", "math", "eng", "eng", "eng", "Hist", "Hist"],
"student": ["Francy", "vacant", "Alex", "Arthur", "Katy", "Jack", "vacant", "Francy", "Francy", "Francy"]
})
# An identical DF with all students equal to "vacant"
vacant_df = pd.DataFrame({"class": df["class"], "student": "vacant"})
# Remove existing 'vacant' from original DF and concatenate with de-duplicated vacant dataframe (to avoid duplicate 'vacant' entries)
final_df = pd.concat([df.loc[df.student != "vacant", vacant_df.drop_duplicates("class")])
Original DF:
class student
8 Hist Francy
9 Hist Francy
0 Sci Francy
1 Sci vacant
5 eng Jack
6 eng vacant
7 eng Francy
2 math Alex
3 math Arthur
4 math Katy
Final DF:
class student
8 Hist Francy
9 Hist Francy
8 Hist vacant
0 Sci Francy
0 Sci vacant
5 eng Jack
7 eng Francy
5 eng vacant
2 math Alex
3 math Arthur
4 math Katy
2 math vacant
Upvotes: 2