Reputation: 14062
For the column labelled "Category", I want to fill the cells with white spaces with the value from above, see df number 2 in image below.
Here is what I tried but it didn't work:
df[df['Category']==" "] = np.NaN
df = df['Category'].fillna(method='ffill')
col = ['Category']
df.loc[:,col] = df.loc[:,col].ffill()
df.loc[:,['Category']] = df.loc[:,['Category']].ffill()
df = df.set_index(df.columns[0])
col = ['Category']
df.loc[:,col] = df.loc[:,col].ffill()
I want to then keep the 2nd instance of each row (if that makes sense), see df 3 in image below.
Upvotes: 0
Views: 1180
Reputation: 4827
You can try:
import pandas as pd
import numpy as np
df = pd.DataFrame(
[['Base', 152],
['Male', 98], ['-', .64],
['Female', 52], ['-', .34],
['Non-binary', '-'], ['-', '-'],
['Prefer-not', 2], ['-', .01]],
columns=('category', 'engagement')
)
df = df.replace('-', np.nan)
df['category'] = df['category'].ffill()
Result:
engagement
category
Base 152.00
Female 0.34
Male 0.64
Non-binary NaN
Prefer-not 0.01
Upvotes: 0
Reputation: 30920
We can transform to NaN using Series.mask
and then drop duplicated:
df['Category']=df['Category'].mask(df['Category'].eq('')|df['Category'].isnull()).ffill()
Upvotes: 1
Reputation: 14062
With @ansev's help I have a working version of the code:
#fill whitespaces in 'Category' with values from cell above
df['Category'] = df['Category'].mask(df['Category'].eq('')).ffill()
#drop duplicates in column 'Category' and keep last instance
df = df.drop_duplicates(subset=['Category'], keep='last')
Upvotes: 0