How to clean this data

From this:


+------+------+--------------------------+-----------------+
| code | type |           name           | final_component |
+------+------+--------------------------+-----------------+
| C001 | ACT  | Exhaust Blower Drive     |                 |
| C001 | AL   |                          |                 |
| C001 | AL   |                          |                 |
| C001 | SET  | Exhaust Blower Drive     |                 |
| C001 | AL   |                          |                 |
| C001 | AL   |                          |                 |
| C001 | AL   |                          |                 |
| C002 | ACT  | Spray Pump Motor 1 Pump  |                 |
| C002 | SET  | Spray Pump Motor 1 Pump  |                 |
| C003 | ACT  | Spray Pump Motor 2 Pump  |                 |
| C003 | SET  | Spray Pump Motor 2 Pump  |                 |
| C004 | ACT  | Spray Pump Motor 3 Pump  |                 |
| C004 | SET  | Spray Pump Motor 3 Pump  |                 |
+------+------+--------------------------+-----------------+


Expected:

+------+------+--------------------------+--------------------------+
| code | type |           name           |     final_component      |
+------+------+--------------------------+--------------------------+
| C001 | ACT  | Exhaust Blower Drive     | Exhaust Blower Drive     |
| C001 | AL   |                          | Exhaust Blower Drive     |
| C001 | AL   |                          | Exhaust Blower Drive     |
| C001 | SET  | Exhaust Blower Drive     | Exhaust Blower Drive     |
| C001 | AL   |                          | Exhaust Blower Drive     |
| C001 | AL   |                          | Exhaust Blower Drive     |
| C001 | AL   |                          | Exhaust Blower Drive     |
| C002 | ACT  | Spray Pump Motor 1 Pump  | Spray Pump Motor 1 Pump  |
| C002 | SET  | Spray Pump Motor 1 Pump  | Spray Pump Motor 1 Pump  |
| C003 | ACT  | Spray Pump Motor 2 Pump  | Spray Pump Motor 2 Pump  |
| C003 | SET  | Spray Pump Motor 2 Pump  | Spray Pump Motor 2 Pump  |
| C004 | ACT  | Spray Pump Motor 3 Pump  | Spray Pump Motor 3 Pump  |
| C004 | SET  | Spray Pump Motor 3 Pump  | Spray Pump Motor 3 Pump  |
+------+------+--------------------------+--------------------------+

i have to copy the name value whose type is 'SET' to final_component for all the same code like for C001, name for type 'SET' is Exhaust Blower Drive i have to copy that to final_component for all C001

for ind in dataframe.index:         
    if dataframe['final_component'][ind]!=None:
        temp = dataframe['final_component'][ind]
        temp_code = dataframe['code'][ind]
    i = ind
    while dataframe['code'][i] == temp_code:
        dataframe['final_component'][ind] = temp
        i+=1

i could come up with this but it gets stuck in the while loop

Upvotes: 0

Views: 82

Answers (2)

Joe Ferndz
Joe Ferndz

Reputation: 8508

Solution 1: when data is grouped in order

If your data in 'name' field already has Null values, then you can do something simple as ffill(). Pandas dataframe.ffill() function is used to fill the missing value in the dataframe. ‘ffill’ stands for ‘forward fill’ and will propagate last valid observation forward. In this case, it does not take into consideration the values in code. If you want to consider that as well, then look at solution 2.

import pandas as pd
import numpy as np

a = {'code':['C001']*7+['C002']*2+['C003']*2+['C004']*2,
     'typ':['ACT','AL','AL','SET','AL','AL','AL','ACT','SET','ACT','SET','ACT','SET'],
     'name':['Exhaust Blower Drive',None,None,'Exhaust Blower Drive',np.nan,np.nan,np.nan,
             'Spray Pump Motor 1 Pump','Spray Pump Motor 1 Pump',
             'Spray Pump Motor 2 Pump','Spray Pump Motor 2 Pump',
             'Spray Pump Motor 3 Pump','Spray Pump Motor 3 Pump']}

df = pd.DataFrame(a)

#copy all the values  from name to final_component' with ffill()
#it will fill the values where data does not exist
#this will work only if you think all values above are part of the same set

df['final_component'] = df['name'].ffill()

Solution 2: when data has to be based on another column value

If you are required to fill based on value in code, you can use the below solution.

You can do a lookup and then update the values. Try something like this.

import pandas as pd
import numpy as np
a = {'code':['C001']*7+['C002']*2+['C003']*2+['C004']*2,
     'typ':['ACT','AL','AL','SET','AL','AL','AL','ACT','SET','ACT','SET','ACT','SET'],
     'name':['Exhaust Blower Drive',np.nan,np.nan,'Exhaust Blower Drive',np.nan,np.nan,np.nan,
             'Spray Pump Motor 1 Pump','Spray Pump Motor 1 Pump',
             'Spray Pump Motor 2 Pump','Spray Pump Motor 2 Pump',
             'Spray Pump Motor 3 Pump','Spray Pump Motor 3 Pump']}

df = pd.DataFrame(a)

#copy all the values  from name to final_component' including nulls
df['final_component'] = df['name']
#create a sublist of items based on unique values in code
lookup = df[['code', 'final_component']].groupby('code').first()['final_component']
#identify all the null values that need to be replaced
noname=df['final_component'].isnull()
#replace all null values with correct value based on lookup
df['final_component'].loc[noname] = df.loc[noname].apply(lambda x: lookup[x['code']], axis=1)

print(df)

The output will look like this:

    code  typ                     name          final_component
0   C001  ACT     Exhaust Blower Drive     Exhaust Blower Drive
1   C001   AL                      NaN     Exhaust Blower Drive
2   C001   AL                      NaN     Exhaust Blower Drive
3   C001  SET     Exhaust Blower Drive     Exhaust Blower Drive
4   C001   AL                      NaN     Exhaust Blower Drive
5   C001   AL                      NaN     Exhaust Blower Drive
6   C001   AL                      NaN     Exhaust Blower Drive
7   C002  ACT  Spray Pump Motor 1 Pump  Spray Pump Motor 1 Pump
8   C002  SET  Spray Pump Motor 1 Pump  Spray Pump Motor 1 Pump
9   C003  ACT  Spray Pump Motor 2 Pump  Spray Pump Motor 2 Pump
10  C003  SET  Spray Pump Motor 2 Pump  Spray Pump Motor 2 Pump
11  C004  ACT  Spray Pump Motor 3 Pump  Spray Pump Motor 3 Pump
12  C004  SET  Spray Pump Motor 3 Pump  Spray Pump Motor 3 Pump

Upvotes: 2

jsmart
jsmart

Reputation: 3001

Here is one approach. First, re-create the data frame:

from io import StringIO
import pandas as pd

data = '''| code | type |           name           | final_component |
| C001 | ACT  | Exhaust Blower Drive     |                 |
| C001 | AL   |                          |                 |
| C001 | AL   |                          |                 |
| C001 | SET  | Exhaust Blower Drive     |                 |
| C001 | AL   |                          |                 |
| C001 | AL   |                          |                 |
| C001 | AL   |                          |                 |
| C002 | ACT  | Spray Pump Motor 1 Pump  |                 |
| C002 | SET  | Spray Pump Motor 1 Pump  |                 |
| C003 | ACT  | Spray Pump Motor 2 Pump  |                 |
| C003 | SET  | Spray Pump Motor 2 Pump  |                 |
| C004 | ACT  | Spray Pump Motor 3 Pump  |                 |
| C004 | SET  | Spray Pump Motor 3 Pump  |                 |
'''
df = pd.read_csv(StringIO(data), sep='|',)
df = df.drop(columns=['Unnamed: 0', 'Unnamed: 5'])

Now, remove leading and trailing spaces:

# remove leading / trailing spaces
df.columns = [c.strip() for c in df.columns]
for col in df.columns:
    if df[col].dtype == object:
        df[col] = df[col].str.strip()

And populate final_component:

# populate 'final component'
df['final_component'] = df['name']

Now replace empty strings with None and use ffill()

# find final component that is empty string...
mask = df['final_component'] == ''

# ... and convert to None...
df.loc[mask, 'final_component'] = None

# ...so we can use ffill()
df['final_component'] = df['final_component'].ffill()
print(df)

    code type                     name          final_component
0   C001  ACT     Exhaust Blower Drive     Exhaust Blower Drive
1   C001   AL                              Exhaust Blower Drive
2   C001   AL                              Exhaust Blower Drive
3   C001  SET     Exhaust Blower Drive     Exhaust Blower Drive
4   C001   AL                              Exhaust Blower Drive
5   C001   AL                              Exhaust Blower Drive
6   C001   AL                              Exhaust Blower Drive
7   C002  ACT  Spray Pump Motor 1 Pump  Spray Pump Motor 1 Pump
8   C002  SET  Spray Pump Motor 1 Pump  Spray Pump Motor 1 Pump
9   C003  ACT  Spray Pump Motor 2 Pump  Spray Pump Motor 2 Pump
10  C003  SET  Spray Pump Motor 2 Pump  Spray Pump Motor 2 Pump
11  C004  ACT  Spray Pump Motor 3 Pump  Spray Pump Motor 3 Pump
12  C004  SET  Spray Pump Motor 3 Pump  Spray Pump Motor 3 Pump

Upvotes: 1

Related Questions