Reputation: 881
I'm have the following code which creates a table and a barplot via seaborn.
#Building a dataframe grouped by the # of Engagement Types
sales_type = sales.groupby('# of Engagement Types').sum()
#Calculating the % of people who bought the course by # engagement types
sales_type['% Sales per Participants'] = round(100*(sales_type['Sales'] / sales_type['Had an Engagement']), 2)
#Calculating the # of people who didn't have any engagements
sales_type.set_value(index=0, col='Had an Engagement', value=sales[sales['Had an Engagement']==0].count()['Sales'])
#Calculating the % of sales for those who didn't have any engagements
sales_type.set_value(index=0, col='% Sales per Participants',
value=round(100 * (sales_type.ix[0, 'Sales'] /
sales[sales['Had an Engagement']==0].count()['Sales']),2))
#Setting the graph image
fig, (ax1) = plt.subplots(nrows=1, ncols=1, figsize=(12,4))
sns.set_style("whitegrid")
# Ploting the histagram for the % of total prospects
ax1 = sns.barplot(x=sales_type.index,y='% Sales per Participants', data=sales_type ,ax=ax1)
ax1.set(ylabel = '%')
ax1.set_title('% Sales per Participants By # of Engagement Types')
#present the table
sales_type.xs(['Had an Engagement', 'Sales','% Sales per Participants'],axis=1).transpose()
#sales_type
I'm using the same code concept for other parameters I have with no issue. However, for one parameter I get an error: "ValueError: Grouper for '' not 1-dimensional" for the line code:
ax1 = sns.barplot(x=sales_type.index,y='% Sales per Participants', data=sales_type ,ax=ax1)
This error occurs although the dataframe doesn't have more than one dimension.
This is the head of the table:
Sales Pre-Ordered / Ordered Book \
# of Engagement Types
0 1.0 0.0
1 20.0 496.0
2 51.0 434.0
3 82.0 248.0
4 71.0 153.0
5 49.0 97.0
6 5.0 24.0
Opted In For / Clicked to Kindle Viewed PLC \
# of Engagement Types
0 0.0 0
1 27034.0 5920
2 6953.0 6022
3 1990.0 1958
4 714.0 746
5 196.0 204
6 24.0 24
# of PLC Engagement Viewed Webinar \
# of Engagement Types
0 0.0 0
1 6434.0 1484
2 7469.0 1521
3 2940.0 1450
4 1381.0 724
5 463.0 198
6 54.0 24
# of Webinars (Live/Replay) \
# of Engagement Types
0 0.0
1 1613.0
2 1730.0
3 1768.0
4 1018.0
5 355.0
6 45.0
OCCC Facebook Group Member Engaged in Cart-Open \
# of Engagement Types
0 0.0 0
1 148.0 160
2 498.0 1206
3 443.0 967
4 356.0 511
5 168.0 177
6 24.0 24
# of Engagement at Cart Open Had an Engagement \
# of Engagement Types
0 0.0 3387
1 189.0 35242
2 1398.0 8317
3 1192.0 2352
4 735.0 801
5 269.0 208
6 40.0 24
Total # of Engagements % Sales per Participants
# of Engagement Types
0 0.0 0.03
1 35914.0 0.06
2 18482.0 0.61
3 8581.0 3.49
4 4357.0 8.86
5 1548.0 23.56
6 211.0 20.83
This is the full error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-211-f0185fe64c1a> in <module>()
12 sns.set_style("whitegrid")
13 # Ploting the histagram for the % of total prospects
---> 14 ax1 = sns.barplot(x=sales_type.index,y='% Sales per Participants', data=sales_type ,ax=ax1)
15 ax1.set(ylabel = '%')
16 ax1.set_title('% Sales per Participants By # of Engagement Types')
ValueError: Grouper for '<class 'pandas.core.frame.DataFrame'>' not 1-dimensional
I've tried to search the internet and Stack Overflow for this error, but got no results. Does anyone has an idea what's going on?
Upvotes: 87
Views: 151327
Reputation: 2479
TL;DR:
Quick example: if I am to groupby a bunch of people by careers, a person is either an eng or a tech, can't be both, otherwise groupby()
won't know to put that person in the tech group or the eng group.
Your code, unfortunately assigned some people into both eng AND tech at the same time.
groupby()
does.We will be using this example fruit df
shown here:
import pandas as pd
import numpy as np
df = pd.DataFrame(
{"fruit": ['apple', 'apple', 'orange', 'orange'],
"color": ['r', 'g', 'b', 'r']},
index=[11, 22, 33, 44],
)
>>> df
+----+---------+---------+
| | fruit | color |
|----+---------+---------|
| 11 | apple | r |
| 22 | apple | g |
| 33 | orange | b |
| 44 | orange | r |
+----+---------+---------+
Observe a very valid df.groupby()
below, deviating from a typical useage:
gp = df.groupby(
{
0: 'mine',
1: 'mine',
11: 'mine',
22: 'mine',
33: 'mine',
44: 'you are rats with wings!',
}
)
>>> gp.get_group('mine')
+----+---------+---------+
| | fruit | color |
|----+---------+---------|
| 11 | apple | r |
| 22 | apple | g |
| 33 | orange | b |
+----+---------+---------+
>>> gp.get_group('you are rats with wings!')
+----+---------+---------+
| | fruit | color |
|----+---------+---------|
| 44 | orange | r |
+----+---------+---------+
Wait, the groupby()
didn't even use 'fruit' or 'color' at all?!
That's right! groupby()
doesn't need to care about df
or 'fruit' or 'color' or Nemo. groupby()
only cares about one thing: a lookup table that tells it which index belongs to which group.
In this case, for example, the dictionary passed to the groupby()
is instructing the groupby()
:
11
, then it is a "mine"
, put the row with that index in the group named "mine"
.22
, then it is a "mine"
, put the row with that index in the group named "mine"
.Even 0 and 1 not being in df.index
is not a problem
Conventional df.groupby('fruit')
or df.groupby(df['fruit'])
follows exactly the rule above. The column df['fruit']
is used as a lookup table, it tells groupby()
that index 11
is an "apple"
Grouper for '<class 'pandas.core.frame.DataFrame'>' not 1-dimensional
What it is saying is really: "for some or all indexes in df, you are assigning MORE THAN just one label"
Let's examine some possible errors using the above example:
[x] df.groupby(df)
will not work, you gave groupby()
a 2D mapping, each index was given 2 group names. It will complain: 'is index 11 an "apple" or an "r"? make up your mind!'
[x] the below codes will also not work. Although the mapping is now 1D, it is mapping index 11
to "mine"
as well as "yours"
. Pandas' df
and sr
allow non-unique index, so be careful.
mapping = pd.DataFrame(index= [ 11, 11, 22, 33, 44 ],
data = ['mine', 'yours', 'mine', 'mine', 'yours'], )
df.groupby(mapping)
# different error message, but same idea
mapping = pd.Series( index= [ 11, 11, 22, 33, 44 ],
data = ['mine', 'yours', 'mine', 'mine', 'yours'], )
df.groupby(mapping)
Upvotes: 20
Reputation: 32214
I also ran into this problem and found that it was caused by duplicate column names.
To recreate this:
df = pd.DataFrame({"foo": [1,2,3], "bar": [1,2,3]})
df.rename(columns={'foo': 'bar'}, inplace=True)
bar bar
0 1 1
1 2 2
2 3 3
df.groupby('bar')
ValueError: Grouper for 'bar' not 1-dimensional
Just like a lot of cryptic pandas errors, this one too stems from having two columns with the same name.
Figure out which one you want to use, rename or drop the other column and redo the operation.
Rename the columns like this
df.columns = ['foo', 'bar']
foo bar
0 1 1
1 2 2
2 3 3
df.groupby('bar')
<pandas.core.groupby.DataFrameGroupBy object at 0x1066dd950>
Upvotes: 162
Reputation: 658
Fix the problem by correcting the column name first, probably the column name isn't a 1 dimensional list when you input. you can do:
column_name = ["foo", "bar"]
df = pd.DataFrame(values, columns=column_name)
# then groupby again
df.groupby("bar")
Upvotes: 0
Reputation: 1611
Happened to me when I was using df instead of pd as:
df.pivot_table(df[["....
instead of
pd.pivot_table(df[["...
Upvotes: 4
Reputation: 940
Something to add to @w-m's answer.
If you are adding multiple columns from one dataframe to another:
df1[['col1', 'col2']] = df2[['col1', 'col2']]
it will create a multi-column index and if you try to group by anything on df1
, it will give you this error.
To solve this, get rid of the multi-index by using
df1.columns = df1.columns.get_level_values(0)
Upvotes: 7
Reputation: 11232
Happened to me when I accidentally created MultiIndex columns:
>>> values = np.asarray([[1, 1], [2, 2], [3, 3]])
# notice accidental double brackets around column list
>>> df = pd.DataFrame(values, columns=[["foo", "bar"]])
# prints very innocently
>>> df
foo bar
0 1 1
1 2 2
2 3 3
# but throws this error
>>> df.groupby("foo")
ValueError: Grouper for 'foo' not 1-dimensional
# cause:
>>> df.columns
MultiIndex(levels=[['bar', 'foo']],
labels=[[1, 0]])
# fix by using correct columns list
>>> df = pd.DataFrame(values, columns=["foo", "bar"])
>>> df.groupby("foo")
<pandas.core.groupby.groupby.DataFrameGroupBy object at 0x7f9a280cbb70>
Upvotes: 11