Reputation: 141
I have a pandas data frame, df
, that has 4 columns and a lot of rows.
I want to create 5 different data frames based on the value of one of the columns of the data frame. The column I am referring to is called color
.
color
has 5 unique values: red
, blue
, green
, yellow
, orange
.
What I want to do is each of the 5 new data frames should contain all rows which have on of the values in color
. For instance df_blue
should have all the rows and columns where in the other data frame the value from the color
column is blue.
The code I have is the following:
# create 5 new data frames
df_red = []
df_blue= []
df_green= []
df_yellow= []
df_orange= []
for i in range(len(df)):
if df['color'] == "blue"
df_blue.append(df)
# i would do if-else statements to satisfy all 5 colors
I feel I am missing some logic...any suggestions or comments?
Thanks!
Upvotes: 0
Views: 871
Reputation: 141
I ended up doing this for each of the colors.
blue_data = data[data.color =='blue']
Upvotes: -1
Reputation: 57033
You need to use groupby
. The following code fragment creates a sample DataFrame and converts it into a dictionary where colors are keys and the matching dataframes are values:
df = pd.DataFrame({'color': ['red','blue','red','green','blue'],
'foo': [1,2,3,4,5]})
colors = {color: dfc for color,dfc in df.groupby('color')}
#{'blue': color foo
# 1 blue 2
# 4 blue 5,
# 'green': color foo
# 3 green 4,
# 'red': color foo
# 0 red 1
# 2 red 3}
Upvotes: 3