Reputation: 83
I'm trying to manipulate a dataframe in Pandas and running into some issues. I've looked at some variants of the questions that have been asked here and most of them involve using pivot and discarding some of the existing columns, I'm wondering if there's a way around that.
I created some simple data as an illustration that's similar to my existing data:
import pandas as pd
raw_data = {'FirstName': ["John", "Jill", "Jack", "John", "Jill", "Jack",],
'LastName': ["Blue", "Green", "Yellow","Blue", "Green", "Yellow"],
'Building': ["Building1", "Building1", "Building2","Building1", "Building1", "Building2"],
'Month': ["November", "November", "November", "December","December", "December"],
'Sales': [100, 150, 275, 200, 150, 150]}
frame = pd.DataFrame(raw_data, columns =raw_data.keys())
This produces a dataframe that looks like this:
OutputFrame What I want to do, is turn the months into columns, WHILE preserving the other data. So something like this:DesiredFrame
I've tried the suggestions to pivot from here: Pandas long to wide reshape, by two variables
I tried pivoting on months:
frame.pivot(columns = 'Month')
I tried adding more columns to see if it would clean up:
frame.pivot(columns = ('FirstName', 'LastName','Month'), values = 'Sales' )
In both cases I got some sort bizzare columns. I'm very curious to know what Pandas is doing here, but I have no idea how to make sense of this.
I figure I could loop through and just re-create the data, but I figure this has to be a better way?
Upvotes: 2
Views: 1472
Reputation: 41
I upvoted Murilo Cunha's answer above.
If you have a larger DataFrame and want a more general answer for a single column to make wide, you could use the following modification to Murilo's answer, so that the pivot index covers all other columns without having to specify them by name:
raw_data = {'FirstName': ["John", "Jill", "Jack", "John", "Jill", "Jack",],
'LastName': ["Blue", "Green", "Yellow","Blue", "Green", "Yellow"],
'Building': ["Building1", "Building1", "Building2", "Building1","Building1", "Building2"],
'Month': ["November", "November", "November", "December","December", "December"],
'Sales': [100, 150, 275, 200, 150, 150]}
frame = pd.DataFrame(raw_data, columns =raw_data.keys())
col_to_wide = "Month"
vals = "Sales"
# keep all other columns
keep_cols = [col for col in frame.columns if col not in [col_to_wide, vals]]
df = (
frame.pivot(
index=keep_cols,
columns=col_to_wide,
values=vals
)
.reset_index() # collapses multi-index
.rename_axis(None, axis=1) # renames index
)
df
Upvotes: 0
Reputation: 506
You were actually almost all the way there with pivot()
. Specifying the index
will take you almost all the way there:
import pandas as pd
raw_data = {'FirstName': ["John", "Jill", "Jack", "John", "Jill", "Jack",],
'LastName': ["Blue", "Green", "Yellow","Blue", "Green", "Yellow"],
'Building': ["Building1", "Building1", "Building2","Building1", "Building1", "Building2"],
'Month': ["November", "November", "November", "December","December", "December"],
'Sales': [100, 150, 275, 200, 150, 150]}
frame = pd.DataFrame(raw_data, columns =raw_data.keys())
df = frame.pivot(
index=["FirstName", "LastName", "Building"],
columns="Month",
values="Sales",
)
df
The only difference is that you will have a multi-level index in your dataframe. If you want to get exactly the desired output, you'd need to collapse the multi-index and rename the index (you can chain them as well)
import pandas as pd
raw_data = {'FirstName': ["John", "Jill", "Jack", "John", "Jill", "Jack",],
'LastName': ["Blue", "Green", "Yellow","Blue", "Green", "Yellow"],
'Building': ["Building1", "Building1", "Building2","Building1", "Building1", "Building2"],
'Month': ["November", "November", "November", "December","December", "December"],
'Sales': [100, 150, 275, 200, 150, 150]}
frame = pd.DataFrame(raw_data, columns =raw_data.keys())
df = (
frame.pivot(
index=["FirstName", "LastName", "Building"],
columns="Month",
values="Sales"
)
.reset_index() # collapses multi-index
.rename_axis(None, axis=1) # renames index
)
df
Upvotes: 4