Reputation: 303
I just want to clean the dataframe and analyse the dataframe. However, I got in trouble. I created a simple dataframe to illustrate it:
import pandas as pd
d = {'Resutls': ['IIL', 'pass','pass','IIH','pass','IIL','pass'], 'part':['None',1,2,'None',5,'None',4] }
df = pd.DataFrame(d)
the result looks like:
Resutls part
0 IIL None
1 pass 1
2 pass 2
3 IIH None
4 pass 5
5 IIL None
6 pass 4
There are some repeatable modules in the dataframe. I just want to reorder the dataframe by rows and drop the duplicated ones like:
Resutls part
0 IIL None
1 pass 1
2 pass 2
6 pass 4
3 IIH None
4 pass 5
or just split the dataframe into several sub dataframes:
Resutls part
0 IIL None
1 pass 1
2 pass 2
3 pass 4
Resutls part
0 IIH None
1 pass 5
This is just an easy example what I want to do. Actually I have a 4000-thousand rows dataframe, I tried to use reindex or df.iloc to do this. It is intuitive for me but seems a little complicated to achieve. Is there any good way to do this? Please advise.
Upvotes: 1
Views: 107
Reputation: 862921
I think you need replace pass
to NaN
s and use forward filling, then sort by argsort
and reorder by iloc
:
df = df.iloc[df['Resutls'].mask(df['Resutls'].eq('pass')).ffill().argsort()]
print (df)
Resutls part
3 IIH None
4 pass 5
0 IIL None
1 pass 1
2 pass 2
5 IIL None
6 pass 4
Last remove repeating rows by boolean indexing
:
df = df[~df['Resutls'].duplicated() | (df['Resutls'] == 'pass')]
print (df)
Resutls part
3 IIH None
4 pass 5
0 IIL None
1 pass 1
2 pass 2
6 pass 4
If want each DataFrame separately:
df['g'] = df['Resutls'].mask(df['Resutls'].eq('pass')).ffill()
df = df[~df['Resutls'].duplicated() | (df['Resutls'] == 'pass')]
print (df)
Resutls part g
0 IIL None IIL
1 pass 1 IIL
2 pass 2 IIL
3 IIH None IIH
4 pass 5 IIH
6 pass 4 IIL
dfs = {k:v.drop('g', axis=1) for k, v in df.groupby('g')}
#print (dfs)
print (dfs['IIH'])
Resutls part
3 IIH None
4 pass 5
print (dfs['IIL'])
Resutls part
0 IIL None
1 pass 1
2 pass 2
6 pass 4
Upvotes: 1