Reputation: 3
From this:
+------+------+--------------------------+-----------------+
| code | type | name | final_component |
+------+------+--------------------------+-----------------+
| C001 | ACT | Exhaust Blower Drive | |
| C001 | AL | | |
| C001 | AL | | |
| C001 | SET | Exhaust Blower Drive | |
| C001 | AL | | |
| C001 | AL | | |
| C001 | AL | | |
| C002 | ACT | Spray Pump Motor 1 Pump | |
| C002 | SET | Spray Pump Motor 1 Pump | |
| C003 | ACT | Spray Pump Motor 2 Pump | |
| C003 | SET | Spray Pump Motor 2 Pump | |
| C004 | ACT | Spray Pump Motor 3 Pump | |
| C004 | SET | Spray Pump Motor 3 Pump | |
+------+------+--------------------------+-----------------+
Expected:
+------+------+--------------------------+--------------------------+
| code | type | name | final_component |
+------+------+--------------------------+--------------------------+
| C001 | ACT | Exhaust Blower Drive | Exhaust Blower Drive |
| C001 | AL | | Exhaust Blower Drive |
| C001 | AL | | Exhaust Blower Drive |
| C001 | SET | Exhaust Blower Drive | Exhaust Blower Drive |
| C001 | AL | | Exhaust Blower Drive |
| C001 | AL | | Exhaust Blower Drive |
| C001 | AL | | Exhaust Blower Drive |
| C002 | ACT | Spray Pump Motor 1 Pump | Spray Pump Motor 1 Pump |
| C002 | SET | Spray Pump Motor 1 Pump | Spray Pump Motor 1 Pump |
| C003 | ACT | Spray Pump Motor 2 Pump | Spray Pump Motor 2 Pump |
| C003 | SET | Spray Pump Motor 2 Pump | Spray Pump Motor 2 Pump |
| C004 | ACT | Spray Pump Motor 3 Pump | Spray Pump Motor 3 Pump |
| C004 | SET | Spray Pump Motor 3 Pump | Spray Pump Motor 3 Pump |
+------+------+--------------------------+--------------------------+
i have to copy the name value whose type is 'SET' to final_component for all the same code like for C001, name for type 'SET' is Exhaust Blower Drive i have to copy that to final_component for all C001
for ind in dataframe.index:
if dataframe['final_component'][ind]!=None:
temp = dataframe['final_component'][ind]
temp_code = dataframe['code'][ind]
i = ind
while dataframe['code'][i] == temp_code:
dataframe['final_component'][ind] = temp
i+=1
i could come up with this but it gets stuck in the while loop
Upvotes: 0
Views: 82
Reputation: 8508
If your data in 'name'
field already has Null values, then you can do something simple as ffill(). Pandas dataframe.ffill() function is used to fill the missing value in the dataframe. ‘ffill’ stands for ‘forward fill’ and will propagate last valid observation forward. In this case, it does not take into consideration the values in code
. If you want to consider that as well, then look at solution 2.
import pandas as pd
import numpy as np
a = {'code':['C001']*7+['C002']*2+['C003']*2+['C004']*2,
'typ':['ACT','AL','AL','SET','AL','AL','AL','ACT','SET','ACT','SET','ACT','SET'],
'name':['Exhaust Blower Drive',None,None,'Exhaust Blower Drive',np.nan,np.nan,np.nan,
'Spray Pump Motor 1 Pump','Spray Pump Motor 1 Pump',
'Spray Pump Motor 2 Pump','Spray Pump Motor 2 Pump',
'Spray Pump Motor 3 Pump','Spray Pump Motor 3 Pump']}
df = pd.DataFrame(a)
#copy all the values from name to final_component' with ffill()
#it will fill the values where data does not exist
#this will work only if you think all values above are part of the same set
df['final_component'] = df['name'].ffill()
If you are required to fill based on value in code, you can use the below solution.
You can do a lookup and then update the values. Try something like this.
import pandas as pd
import numpy as np
a = {'code':['C001']*7+['C002']*2+['C003']*2+['C004']*2,
'typ':['ACT','AL','AL','SET','AL','AL','AL','ACT','SET','ACT','SET','ACT','SET'],
'name':['Exhaust Blower Drive',np.nan,np.nan,'Exhaust Blower Drive',np.nan,np.nan,np.nan,
'Spray Pump Motor 1 Pump','Spray Pump Motor 1 Pump',
'Spray Pump Motor 2 Pump','Spray Pump Motor 2 Pump',
'Spray Pump Motor 3 Pump','Spray Pump Motor 3 Pump']}
df = pd.DataFrame(a)
#copy all the values from name to final_component' including nulls
df['final_component'] = df['name']
#create a sublist of items based on unique values in code
lookup = df[['code', 'final_component']].groupby('code').first()['final_component']
#identify all the null values that need to be replaced
noname=df['final_component'].isnull()
#replace all null values with correct value based on lookup
df['final_component'].loc[noname] = df.loc[noname].apply(lambda x: lookup[x['code']], axis=1)
print(df)
The output will look like this:
code typ name final_component
0 C001 ACT Exhaust Blower Drive Exhaust Blower Drive
1 C001 AL NaN Exhaust Blower Drive
2 C001 AL NaN Exhaust Blower Drive
3 C001 SET Exhaust Blower Drive Exhaust Blower Drive
4 C001 AL NaN Exhaust Blower Drive
5 C001 AL NaN Exhaust Blower Drive
6 C001 AL NaN Exhaust Blower Drive
7 C002 ACT Spray Pump Motor 1 Pump Spray Pump Motor 1 Pump
8 C002 SET Spray Pump Motor 1 Pump Spray Pump Motor 1 Pump
9 C003 ACT Spray Pump Motor 2 Pump Spray Pump Motor 2 Pump
10 C003 SET Spray Pump Motor 2 Pump Spray Pump Motor 2 Pump
11 C004 ACT Spray Pump Motor 3 Pump Spray Pump Motor 3 Pump
12 C004 SET Spray Pump Motor 3 Pump Spray Pump Motor 3 Pump
Upvotes: 2
Reputation: 3001
Here is one approach. First, re-create the data frame:
from io import StringIO
import pandas as pd
data = '''| code | type | name | final_component |
| C001 | ACT | Exhaust Blower Drive | |
| C001 | AL | | |
| C001 | AL | | |
| C001 | SET | Exhaust Blower Drive | |
| C001 | AL | | |
| C001 | AL | | |
| C001 | AL | | |
| C002 | ACT | Spray Pump Motor 1 Pump | |
| C002 | SET | Spray Pump Motor 1 Pump | |
| C003 | ACT | Spray Pump Motor 2 Pump | |
| C003 | SET | Spray Pump Motor 2 Pump | |
| C004 | ACT | Spray Pump Motor 3 Pump | |
| C004 | SET | Spray Pump Motor 3 Pump | |
'''
df = pd.read_csv(StringIO(data), sep='|',)
df = df.drop(columns=['Unnamed: 0', 'Unnamed: 5'])
Now, remove leading and trailing spaces:
# remove leading / trailing spaces
df.columns = [c.strip() for c in df.columns]
for col in df.columns:
if df[col].dtype == object:
df[col] = df[col].str.strip()
And populate final_component
:
# populate 'final component'
df['final_component'] = df['name']
Now replace empty strings with None
and use ffill()
# find final component that is empty string...
mask = df['final_component'] == ''
# ... and convert to None...
df.loc[mask, 'final_component'] = None
# ...so we can use ffill()
df['final_component'] = df['final_component'].ffill()
print(df)
code type name final_component
0 C001 ACT Exhaust Blower Drive Exhaust Blower Drive
1 C001 AL Exhaust Blower Drive
2 C001 AL Exhaust Blower Drive
3 C001 SET Exhaust Blower Drive Exhaust Blower Drive
4 C001 AL Exhaust Blower Drive
5 C001 AL Exhaust Blower Drive
6 C001 AL Exhaust Blower Drive
7 C002 ACT Spray Pump Motor 1 Pump Spray Pump Motor 1 Pump
8 C002 SET Spray Pump Motor 1 Pump Spray Pump Motor 1 Pump
9 C003 ACT Spray Pump Motor 2 Pump Spray Pump Motor 2 Pump
10 C003 SET Spray Pump Motor 2 Pump Spray Pump Motor 2 Pump
11 C004 ACT Spray Pump Motor 3 Pump Spray Pump Motor 3 Pump
12 C004 SET Spray Pump Motor 3 Pump Spray Pump Motor 3 Pump
Upvotes: 1