Reputation: 1052

Add values to dataframe based on existing unique values

I have a dataframe DF with 2 columns:

CLASS   STUDENT
'Sci'   'Francy'
'Sci'   Vacant
'math'  'Alex'
'math'  'Arthur'
'math'  'Katy'
'eng'   'Jack'
'eng'   Vacant
'eng'   'Francy'
'Hist'   'Francy'
'Hist'   'Francy'

I need all the classes to have 1 vacant student. Some of them already have.

RESULT

CLASS   STUDENT
'Sci'   'Francy'
'Sci'   Vacant
'math'  'Alex'
'math'  'Arthur'
'math'  'Katy'
'math'  Vacant
'eng'   'Jack'
'eng'   Vacant
'eng'   'Francy'
'Hist'   'Francy'
'Hist'   'Francy'
'Hist'   Vacant

I have tried

unique_class = DF['unique_class'].drop_duplicates()
vacant_column = pd.Series(['vacant'] * unique_class.shape[0])
temp_df = pd.concat([unique_class, vacant_column], axis=1, ignore_index=True)
DF = DF.append(temp_df, ignore_index=True)
DF.drop_duplicates(inplace=True)

It works but it seems too much. Any better way to do this?

Upvotes: 1

Answers (3)

Chris Adams

Reputation: 18647

For the record, there is nothing wrong with your solution. You can achieve the same result in a "one-liner" using practically the same approach:

df = df.append(df[['CLASS']].drop_duplicates().assign(STUDENT='Vacant')).drop_duplicates()

[out]

  CLASS STUDENT
0   Sci  Francy
1   Sci  Vacant
2  math    Alex
3  math  Arthur
4  math    Katy
5   eng    Jack
6   eng  Vacant
7   eng  Francy
8  Hist  Francy
2  math  Vacant
8  Hist  Vacant

And if required, you can chain on sort_values and reset_index to make the table a bit more legible:

df = (df.append(df[['CLASS']].drop_duplicates().assign(STUDENT='Vacant'))
      .drop_duplicates()
      .sort_values('CLASS')
      .reset_index(drop=True))

[out]

   CLASS STUDENT
0   Hist  Francy
1   Hist  Vacant
2    Sci  Francy
3    Sci  Vacant
4    eng    Jack
5    eng  Vacant
6    eng  Francy
7   math    Alex
8   math  Arthur
9   math    Katy
10  math  Vacant

Upvotes: 2

Sociopath

Reputation: 13426

Using pd.merge

df_new = pd.DataFrame({'CLASS': df['CLASS'].unique(), 'STUDENT':'vacant'})

df_new.merge(df, how='outer', on=['CLASS','STUDENT'])

# Use `.sort_values(by='CLASS') if sorted df needed

Output:

    CLASS   STUDENT
0   Sci vacant
1   math    vacant
2   eng     vacant
3   Hist    vacant
4   Sci     Francy
5   math    Alex
6   math    Arthur
7   math    Katy
8   eng     Jack
9   eng     Francy
10  Hist    Francy
11  Hist    Francy

Upvotes: 2

Plasma

Reputation: 1961

Here's another way to do it:

# Copy of your data
df = pd.DataFrame({
    "class": ["Sci", "Sci", "math", "math", "math", "eng", "eng", "eng", "Hist", "Hist"],
    "student": ["Francy", "vacant", "Alex", "Arthur", "Katy", "Jack", "vacant", "Francy", "Francy", "Francy"]
    })

# An identical DF with all students equal to "vacant"
vacant_df = pd.DataFrame({"class": df["class"], "student": "vacant"})

# Remove existing 'vacant' from original DF and concatenate with de-duplicated vacant dataframe (to avoid duplicate 'vacant' entries)
final_df = pd.concat([df.loc[df.student != "vacant", vacant_df.drop_duplicates("class")])

Original DF:

  class student
8  Hist  Francy
9  Hist  Francy
0   Sci  Francy
1   Sci  vacant
5   eng    Jack
6   eng  vacant
7   eng  Francy
2  math    Alex
3  math  Arthur
4  math    Katy

Final DF:

  class student
8  Hist  Francy
9  Hist  Francy
8  Hist  vacant
0   Sci  Francy
0   Sci  vacant
5   eng    Jack
7   eng  Francy
5   eng  vacant
2  math    Alex
3  math  Arthur
4  math    Katy
2  math  vacant

Upvotes: 2

Add values to dataframe based on existing unique values

Answers (3)

Related Questions