Tanmay Lata
Tanmay Lata

Reputation: 263

Merging a DataFrame across Rows while dropping the NaN values

I have this data frame

Data Frame Achieved

I achieved this by writing this code

df = pd.DataFrame(columns = ['Step Number' , 'CAN_Send' , 'CAN_Values'])
can = [{'ta1': ('atpcinfolamp_co', '3')}, {'ta2': ('xyz_signal', '4')}, {'ta2': ('abc_signal', '5')}]
keys = []
for can_signals in can:
    for key,value in can_signals.items():
        if key not in keys:
            keys.append(key)
            df = df.append({'Step Number' : key} , ignore_index = True)
            df = df.append({'CAN_Send' : value[0]} , ignore_index = True)
            df = df.append({'CAN_Values' : value[1]} , ignore_index = True)
        else:
            df = df.append({'CAN_Send' : value[0]} , ignore_index = True)
            df = df.append({'CAN_Values' : value[1]} , ignore_index = True)
df

I need to have a dataframe that looks something like this Required Output Dataframe

I am unable to crack how to merge across columns while dropping the NaNs at the same time.

I tried something like

df = df.groupby('Step Number')[['CAN_Send' , 'CAN_Values']]

but this did not work as there are no numerical operations to convert the groupby object into a frame as I have string values and any methods to drop NaN ends up clearing my entire dataframe.

Any help in this regard is very much appreciated!

Thanks in advance!

Upvotes: 1

Views: 70

Answers (1)

SeaBean
SeaBean

Reputation: 23217

You can fill in the missing values of Step Number first by .ffill(). Then groupby() Step Number and then aggregate() the remaining 2 columns with dropna(), as follows:

df['Step Number'] = df['Step Number'].ffill()

df_out = (df.groupby('Step Number', as_index=False)
            .agg(lambda x: x.dropna(how='all'))
            .apply(pd.Series.explode)
         )

Result:

print(df_out)

  Step Number         CAN_Send CAN_Values
0         ta1  atpcinfolamp_co          3
1         ta2       xyz_signal          4
1         ta2       abc_signal          5

Edit

For your new dataset, you can use the following codes. It works also for the previous dataset and should work in general for the structures created by your program logics.

df['Step Number'] = df['Step Number'].ffill()
df['CAN_Send'] = df['CAN_Send'].ffill(limit=1)
df['CAN_Values'] = df['CAN_Values'].bfill(limit=1)
df = df.dropna().drop_duplicates()

Demo

Data Preparation:

Your codes are fine-tuned to make your logics work in general. Otherwise, if you have a key appear more than once but with other key appears in between this key (e.g. keys appear with sequence ta1, ta2, ta1), your existing logics will fail to add a new row for the Step Number of this key (e.g. the last ta1) which already exists in the list keys

df = pd.DataFrame(columns = ['Step Number' , 'CAN_Send' , 'CAN_Values'])
#can = [{'ta1': ('atpcinfolamp_co', '3')}, {'ta2': ('xyz_signal', '4')}, {'ta2': ('abc_signal', '5')}]
can = [{'ta1': ('atpcinfolamp_co', '3')}, {'ta1': ('hdcinfolamp_co', '5')}, {'ta2': ('xyz_signal', '4')}, {'ta2': ('abc_signal', '5')}] 
#keys = []
last_key = ''
for can_signals in can:
    for key,value in can_signals.items():
        if key != last_key:
#            keys.append(key)
            last_key = key
            df = df.append({'Step Number' : key} , ignore_index = True)
#            df = df.append({'CAN_Send' : value[0]} , ignore_index = True)
#            df = df.append({'CAN_Values' : value[1]} , ignore_index = True)
#        else:
#            df = df.append({'CAN_Send' : value[0]} , ignore_index = True)
#            df = df.append({'CAN_Values' : value[1]} , ignore_index = True)
        df = df.append({'CAN_Send' : value[0]} , ignore_index = True)
        df = df.append({'CAN_Values' : value[1]} , ignore_index = True)

df

  Step Number         CAN_Send CAN_Values
0         ta1              NaN        NaN
1         NaN  atpcinfolamp_co        NaN
2         NaN              NaN          3
3         NaN   hdcinfolamp_co        NaN
4         NaN              NaN          5
5         ta2              NaN        NaN
6         NaN       xyz_signal        NaN
7         NaN              NaN          4
8         NaN       abc_signal        NaN
9         NaN              NaN          5

Run new codes:

df['Step Number'] = df['Step Number'].ffill()
df['CAN_Send'] = df['CAN_Send'].ffill(limit=1)
df['CAN_Values'] = df['CAN_Values'].bfill(limit=1)
df = df.dropna().drop_duplicates()

Result:

print(df)

  Step Number         CAN_Send CAN_Values
1         ta1  atpcinfolamp_co          3
3         ta1   hdcinfolamp_co          5
6         ta2       xyz_signal          4
8         ta2       abc_signal          5

Edit 2

Actually, for the structure of your source data can, you can directly arrive at your desired dataframe in a much simpler way as follows:

can = [{'ta1': ('atpcinfolamp_co', '3')}, {'ta1': ('hdcinfolamp_co', '5')}, {'ta2': ('xyz_signal', '4')}, {'ta2': ('abc_signal', '5')}] 

data = {'Step Number': [list(x.keys())[0] for x in can], 'CAN_Send': [list(x.values())[0][0] for x in can], 'CAN_Values': [list(x.values())[0][1] for x in can]}
df = pd.DataFrame(data)

Upvotes: 1

Related Questions