Reputation: 3249
I want to reformat a dataframe by transposing some elements of rows into columns. To provide an example of what I meant.
In the below dataframe, I want all the elements in the code column to be individual columns. And the missing rows like 'JFK 10/06 XX' should be populated as 0 or nan.
Original DataFrame:
loc date code
--- ----- ----
LGA 10/05 XX
LGA 10/06 XX
LGA 10/07 XX
LGA 10/05 YY
LGA 10/06 YY
LGA 10/07 YY
JFK 10/05 XX
###JFK 10/06 XX (missing)
JFK 10/07 XX
JFK 10/05 YY
JFK 10/06 YY
JFK 10/07 YY
To be formatted DataFrame:
loc date XX YY
--- ----- -- --
LGA 10/05 1 1
LGA 10/06 1 1
LGA 10/07 1 1
JFK 10/05 1 1
JFK 10/06 0 1
JFK 10/07 1 1
Here JFK -> 10/06 has an entry 0 in XX
I tried grouping by with the rest of the columns and able to verify the counts, but I couldn't format it in the expected way.
Upvotes: 1
Views: 2402
Reputation: 29719
Another solution using crosstab
which computes the frequency of occurence of the values present in the columns
argument:
pd.crosstab(index=[df['loc'], df['date']], columns=df['code']) \
.reset_index(level=1) \
.sort_index(ascending=False)
Note: It's not a good practice to name the columns as 'loc' which coincidentally also is a method used by pandas
for performing label based location indexing.
Upvotes: 1
Reputation: 1599
stack
the dataframenew_df = (df.set_index(keys=['loc','date'])
.stack()
.reset_index()
.pivot_table(index=['loc','date'], columns=0, fill_value=0, aggfunc='size'))
OUTPUT:
0 XX YY
loc date
JFK 2016-10-05 1 1
2016-10-06 0 1
2016-10-07 1 1
LGA 2016-10-05 1 1
2016-10-06 1 1
2016-10-07 1 1
Upvotes: 0
Reputation: 215077
You are trying to reshape your data to wide format without a value column. One option is to use pivot_table
and specify the size
as the aggregate function, which will count the combinations of index and columns and fill as values. Missing values can be replaced with the fill_value
parameter:
df.pivot_table(index = ['loc', 'date'], columns = 'code',
aggfunc='size', fill_value=0).reset_index()
#code loc date XX YY
# 0 JFK 10/05 1 1
# 1 JFK 10/06 0 1
# 2 JFK 10/07 1 1
# 3 LGA 10/05 1 1
# 4 LGA 10/06 1 1
# 5 LGA 10/07 1 1
Upvotes: 4