Reputation: 12320
I have below datasets
host infra domain user entity bios ip
abcd.com Lake Hov bat sos nvm x.x.x.x
kpm.com Lake Hov bat sos nvm x.x.x.x
ffger.com Data JOV sat sim nvm x.x.x.x
ffger.com Data TOV sat sim nvm NAN
kko.com Lake POV et som nvm NAN
spm.com Lake Hov bat sos nvm NAN
I want to group by infra and with domain include all NAN
so one df would be if df.group_by(infra,domain) like below
abcd.com Lake Hov bat sos nvm x.x.x.x
kpm.com Lake Hov bat sos nvm x.x.x.x
spm.com Lake Hov bat sos nvm NAN
Upvotes: 0
Views: 79
Reputation: 118
I think you can use the dropna parameter of the groupby function for the same.
pd.__version__
# '1.1.0.dev0+2004.g8d10bfb6f'
# Example from the docs
df
a b c
0 1 2.0 3
1 1 NaN 4
2 2 1.0 3
3 1 2.0 2
# without NA (the default)
df.groupby('b').sum()
a c
b
1.0 2 3
2.0 2 5
# with NA
df.groupby('b', dropna=False).sum()
a c
b
1.0 2 3
2.0 2 5
NaN 1 4
Upvotes: 0
Reputation: 862511
You can create dictionary of DataFrames:
dfs = {f'{"_".join(name)}':df for name, df in df.groupby(['infra','domain'])}
print (dfs['Lake_Hov'])
host infra domain user entity bios ip
0 abcd.com Lake Hov bat sos nvm x.x.x.x
1 kpm.com Lake Hov bat sos nvm x.x.x.x
5 spm.com Lake Hov bat sos nvm NAN
If need loops by groups:
for name, df in tuple(df.groupby(['infra','domain'], sort=False)):
print (df)
host infra domain user entity bios ip
0 abcd.com Lake Hov bat sos nvm x.x.x.x
1 kpm.com Lake Hov bat sos nvm x.x.x.x
5 spm.com Lake Hov bat sos nvm NAN
host infra domain user entity bios ip
2 ffger.com Data JOV sat sim nvm x.x.x.x
host infra domain user entity bios ip
3 ffger.com Data TOV sat sim nvm NAN
host infra domain user entity bios ip
4 kko.com Lake POV et som nvm NAN
Upvotes: 1