Reputation: 263
I have this data frame
I achieved this by writing this code
df = pd.DataFrame(columns = ['Step Number' , 'CAN_Send' , 'CAN_Values'])
can = [{'ta1': ('atpcinfolamp_co', '3')}, {'ta2': ('xyz_signal', '4')}, {'ta2': ('abc_signal', '5')}]
keys = []
for can_signals in can:
for key,value in can_signals.items():
if key not in keys:
keys.append(key)
df = df.append({'Step Number' : key} , ignore_index = True)
df = df.append({'CAN_Send' : value[0]} , ignore_index = True)
df = df.append({'CAN_Values' : value[1]} , ignore_index = True)
else:
df = df.append({'CAN_Send' : value[0]} , ignore_index = True)
df = df.append({'CAN_Values' : value[1]} , ignore_index = True)
df
I need to have a dataframe that looks something like this
I am unable to crack how to merge across columns while dropping the NaNs at the same time.
I tried something like
df = df.groupby('Step Number')[['CAN_Send' , 'CAN_Values']]
but this did not work as there are no numerical operations to convert the groupby object into a frame as I have string values and any methods to drop NaN ends up clearing my entire dataframe.
Any help in this regard is very much appreciated!
Thanks in advance!
Upvotes: 1
Views: 70
Reputation: 23217
You can fill in the missing values of Step Number
first by .ffill()
. Then groupby()
Step Number
and then aggregate()
the remaining 2 columns with dropna()
, as follows:
df['Step Number'] = df['Step Number'].ffill()
df_out = (df.groupby('Step Number', as_index=False)
.agg(lambda x: x.dropna(how='all'))
.apply(pd.Series.explode)
)
Result:
print(df_out)
Step Number CAN_Send CAN_Values
0 ta1 atpcinfolamp_co 3
1 ta2 xyz_signal 4
1 ta2 abc_signal 5
For your new dataset, you can use the following codes. It works also for the previous dataset and should work in general for the structures created by your program logics.
df['Step Number'] = df['Step Number'].ffill()
df['CAN_Send'] = df['CAN_Send'].ffill(limit=1)
df['CAN_Values'] = df['CAN_Values'].bfill(limit=1)
df = df.dropna().drop_duplicates()
Data Preparation:
Your codes are fine-tuned to make your logics work in general. Otherwise, if you have a key appear more than once but with other key appears in between this key (e.g. keys appear with sequence ta1
, ta2
, ta1
), your existing logics will fail to add a new row for the Step Number
of this key (e.g. the last ta1
) which already exists in the list keys
df = pd.DataFrame(columns = ['Step Number' , 'CAN_Send' , 'CAN_Values'])
#can = [{'ta1': ('atpcinfolamp_co', '3')}, {'ta2': ('xyz_signal', '4')}, {'ta2': ('abc_signal', '5')}]
can = [{'ta1': ('atpcinfolamp_co', '3')}, {'ta1': ('hdcinfolamp_co', '5')}, {'ta2': ('xyz_signal', '4')}, {'ta2': ('abc_signal', '5')}]
#keys = []
last_key = ''
for can_signals in can:
for key,value in can_signals.items():
if key != last_key:
# keys.append(key)
last_key = key
df = df.append({'Step Number' : key} , ignore_index = True)
# df = df.append({'CAN_Send' : value[0]} , ignore_index = True)
# df = df.append({'CAN_Values' : value[1]} , ignore_index = True)
# else:
# df = df.append({'CAN_Send' : value[0]} , ignore_index = True)
# df = df.append({'CAN_Values' : value[1]} , ignore_index = True)
df = df.append({'CAN_Send' : value[0]} , ignore_index = True)
df = df.append({'CAN_Values' : value[1]} , ignore_index = True)
df
Step Number CAN_Send CAN_Values
0 ta1 NaN NaN
1 NaN atpcinfolamp_co NaN
2 NaN NaN 3
3 NaN hdcinfolamp_co NaN
4 NaN NaN 5
5 ta2 NaN NaN
6 NaN xyz_signal NaN
7 NaN NaN 4
8 NaN abc_signal NaN
9 NaN NaN 5
Run new codes:
df['Step Number'] = df['Step Number'].ffill()
df['CAN_Send'] = df['CAN_Send'].ffill(limit=1)
df['CAN_Values'] = df['CAN_Values'].bfill(limit=1)
df = df.dropna().drop_duplicates()
Result:
print(df)
Step Number CAN_Send CAN_Values
1 ta1 atpcinfolamp_co 3
3 ta1 hdcinfolamp_co 5
6 ta2 xyz_signal 4
8 ta2 abc_signal 5
Actually, for the structure of your source data can
, you can directly arrive at your desired dataframe in a much simpler way as follows:
can = [{'ta1': ('atpcinfolamp_co', '3')}, {'ta1': ('hdcinfolamp_co', '5')}, {'ta2': ('xyz_signal', '4')}, {'ta2': ('abc_signal', '5')}]
data = {'Step Number': [list(x.keys())[0] for x in can], 'CAN_Send': [list(x.values())[0][0] for x in can], 'CAN_Values': [list(x.values())[0][1] for x in can]}
df = pd.DataFrame(data)
Upvotes: 1