Reputation: 2517
I have a DataFrame df1
with columns logfile, pos, category, value
.
Example:
df1 = pd.DataFrame({'logfile': ['log1.log', 'log1.log', 'log1.log', 'log2.log'],
'pos': [1, 2, 3, 1],
'category': ['cat1', 'cat2', 'cat1', 'cat2'],
'value' : [0.1, 0.2, 0.3, 0.4]})
I compute on df1
a count operation for each logfile, position and category:
res1 = df1.groupby(['logfile', 'pos', 'category']).agg(['count'])
I would like to join the resulting DataFrame res
to another DataFrame df2
which has columns logfile, pos, calc
.
Example:
df2 = pd.DataFrame({'logfile': ['log1.log','log2.log'],
'pos': [1, 2],
'calc': [0.7, 0.9]})
Following what I found in similar questions, I do:
res1.reset_index().merge(df2, on=['logfile', 'pos'])
However, now I get as a result a DataFrame with columns logfile, pos, (logfile,), (pos,), (category,), (value,count), calc
. I somehow cannot access the columns by reasonable names.
If I, instead, remove reset_index
before merging, the category
column is gone.
What I would like to have is a DataFrame with normal column names, but without loosing the column category
.
I have Pandas in version 0.25 using in Jupyter Notebook.
Upvotes: 1
Views: 48
Reputation: 294358
When you did your first groupby
, you didn't focus on the value
column and you aggregated with a list. This left you with a MultiIndex
column object.
res1
value
count
logfile pos category
log1.log 1 cat1 1
2 cat2 1
3 cat1 1
log2.log 1 cat2 1
You could have
df1.groupby(['logfile', 'pos', 'category']).agg('count')
# df1.groupby(['logfile', 'pos', 'category']).count()
value
logfile pos category
log1.log 1 cat1 1
2 cat2 1
3 cat1 1
log2.log 1 cat2 1
OR
df1.groupby(['logfile', 'pos', 'category']).value.agg(['count'])
count
logfile pos category
log1.log 1 cat1 1
2 cat2 1
3 cat1 1
log2.log 1 cat2 1
This would leave you with a "Single" Index object
res1 = df1.groupby(['logfile', 'pos', 'category']).value.agg(['count'])
res1.merge(df2, on=['logfile', 'pos'])
logfile pos count calc
0 log1.log 1 1 0.7
OR
res1 = df1.groupby(['logfile', 'pos', 'category']).count()
res1.merge(df2, on=['logfile', 'pos'])
logfile pos value calc
0 log1.log 1 1 0.7
Upvotes: 2
Reputation: 3103
You can simply use as_index=False
in the groupby
operation.
res1 = df1.groupby(['logfile', 'pos', 'category'], as_index=False).agg('count')
res1.merge(df2, on=['logfile', 'pos'], suffixes=('_old', '_new'))
Update With your provided examples:
df1.groupby(['logfile', 'pos', 'category'], as_index=False).agg('count').merge(df2, on=['logfile', 'pos'])
returns
logfile pos category value calc
0 log1.log 1 cat1 1 0.7
Upvotes: 2