Reputation: 1052

Pandas - Replace values in column with other values from the same column

Dataframe with 3 columns:

FLAG CLASS   CATEGORY
yes 'Sci'   'Alpha'
yes 'Sci'   'undefined'
yes 'math'  'Beta'
yes 'math'  'undefined'
yes 'eng'   'Gamma'
yes 'math'  'Beta'
yes 'eng'   'Gamma'
yes 'eng'   'Omega'
yes 'eng'   'Omega'
yes 'eng'   'undefined'
yes 'Geog'  'Lambda'
yes 'Art'   'undefined'
yes 'Art'   'undefined'
yes 'Art'   'undefined'

I want to fill up the 'undefined' values in the column CATEGORY with the other category value (if any) that the class has. E.g. The Science class will fill up its empty category with 'Alpha', The 'math' class will fill up its 'undefined' category with 'Beta'.

In the case there are 2 or more categories to consider, leave as is. E.g. The english class 'eng' has two categories 'Gamma' and 'Omega', so the category 'undefined' for the class English will be left as 'undefined'

If all the categories for a class are 'undefined', leave as 'undefined'.

Result

FLAG CLASS   CATEGORY
yes 'Sci'   'Alpha'
yes 'Sci'   'Alpha'
yes 'math'  'Beta'
yes 'math'  'Beta'
yes 'eng'   'Gamma'
yes 'math'  'Beta'
yes 'eng'   'Gamma'
yes 'eng'   'Gamma'
yes 'eng'   'Omega'
yes 'eng'   'Omega'
yes 'eng'   'undefined'
yes 'Geog'  'Lambda'
yes 'Art'   'undefined'
yes 'Art'   'undefined'
yes 'Art'   'undefined'

IT NEEDS TO GENERALIZE. I HAVE MANY CLASSES IN THE CLASS COLUMN and cannot afford to encode 'Sci' or 'eng'.

I have been trying this with multiple np.wheres but had no luck.

Upvotes: 0

Answers (4)

Andy L.

Reputation: 25239

Edit:
I add another solution using isin to filter out on valid class for updating both not undefined and undefined. Then, updating this exact slice of df.

Steps:
Creating m as the series of CLASS has CATEGORY as undifined and unique not undefined values. Using isin to select qualified rows and where to turn undefined to NaN. Finally, Groupby by CLASS on these row, ffill, bfill per group to fill NaN and assign back to df

m = df.query('CATEGORY!="undefined"').drop_duplicates().CLASS.drop_duplicates(keep=False)
df[df.CLASS.isin(m)] = df[df.CLASS.isin(m)].where(df!='undefined').groupby('CLASS').ffill().bfill()

This solution looks cleaner, but I don't know whether it is slower than original solution since using groupby

Original:
My solution constructs 'not undefined' from 'undefined' mapped by unique 'not undefined' values:

m = df.query('CATEGORY != "undefined"').drop_duplicates().CLASS.drop_duplicates(keep=False)
t = df.query('CATEGORY == "undefined"').CLASS.map(df.loc[m.index].set_index('CLASS').CATEGORY)
df['CATEGORY'].update(t)

Out[553]:
   FLAG CLASS   CATEGORY
0   yes   Sci      Alpha
1   yes   Sci      Alpha
2   yes  math       Beta
3   yes  math       Beta
4   yes   eng      Gamma
5   yes  math       Beta
6   yes   eng      Gamma
7   yes   eng      Omega
8   yes   eng      Omega
9   yes   eng  undefined
10  yes  Geog     Lambda
11  yes   Art  undefined
12  yes   Art  undefined
13  yes   Art  undefined

Upvotes: 1

BENY

Reputation: 323226

I will using ffill and bffil within groupby

s=df.CATEGORY.mask(df.CATEGORY.eq('undefined'))
s2=s.groupby(df['CLASS']).transform('nunique')
df.loc[s2.eq(1)&s.isnull(),'CATEGORY']=s.groupby(df.CLASS).apply(lambda x : x.ffill().bfill())
df
Out[388]: 
   FLAG CLASS   CATEGORY
0   yes   Sci      Alpha
1   yes   Sci      Alpha
2   yes  math       Beta
3   yes  math       Beta
4   yes   eng      Gamma
5   yes  math       Beta
6   yes   eng      Gamma
7   yes   eng      Omega
8   yes   eng      Omega
9   yes   eng  undefined
10  yes  Geog     Lambda
11  yes   Art  undefined
12  yes   Art  undefined
13  yes   Art  undefined

Upvotes: 2

Rajat Jain

Reputation: 2032

Try below:

df['CATEGORY'] = df.replace('undefined', np.nan, regex=True).groupby('CLASS')['CATEGORY'].apply(lambda x: x.fillna(x.mode()[0]) if not x.isna().all() else x).replace(np.nan, "\'undefined\'")

Upvotes: 1

Zaynul Abadin Tuhin

Reputation: 31993

you can do by using boolian indesing

df[(df['CLASS']=='Sci'& df['CATEGORY']=='undefined','CATEGORY')]='Alpha'
df[(df['CLASS']=='math'& df['CATEGORY']=='undefined','CATEGORY')]='Beta'

Upvotes: 0

Pandas - Replace values in column with other values from the same column

Answers (4)

Related Questions