Reputation: 23099
I have the following DF
ID, 1, 2, 3 #Columns
0,Date, Review, Average, Review # Observations
1,01/01/18 2, 4, 3 # Date and Review Score
2,02/01/18 1, 2, 4 #Date and Review Score
I'm attempting to melt this DF into the following, using the below code gets me close:
df = pd.melt(df,id_vars=['ID'],var_name=['Store'],value_name='Score').fillna(0).set_index('ID')
this procudes:
Store Score
ID
Date
01/01/18 1 Review
01/01/18 1 2
02/01/18 1 1
What I would like to do is remove the 'Review' and have it within its own column, something like the following
Store Review Type Score
ID
Date
01/01/18 1, Review, 1
02/01/18 1, Review, 2
I've attempted to do wide to long, but I think I would need to use some level of multiindexing here, or maybe i'm over thinking it.
Considerations :
my DF is 824 columns long and 324 rows my variables are row wise, along the date with the ID being the column header.
Upvotes: 0
Views: 221
Reputation: 14083
If I understand what you are looking for...
starting with this dataframe, which I believe is what you have:
ID 1 2 3
0 Date Review Average Review
1 01/01/18 2 4 3
2 02/01/18 1 2 4
assuming that you do your pd.melt()
you are then left with:
new_df = pd.melt(df,id_vars=['ID'],var_name=['Store'],value_name='Score').fillna(0).set_index('ID')
Store Score
ID
Date 1 Review
01/01/18 1 2
02/01/18 1 1
Date 2 Average
01/01/18 2 4
02/01/18 2 2
Date 3 Review
01/01/18 3 3
02/01/18 3 4
then you can do something like:
# sort index so all the 'Date' values are at the bottom
new_df.sort_index(inplace=True)
# create a new df of just the dates becuase that is your review types
review_types = new_df.loc['Date']
# rename column to review types
review_types.rename(columns={'Score':'Review Type'}, inplace=True)
# remove new_df.loc['Date']
# new_df = new_df.drop(new_df.tail(len(review_types)).index).reset_index()
# UPDATED removal of new_df.loc['Date']
# I recommend removing the date values by doing this and not using .tail()
new_df = new_df[~new_df.index.str.contains('Date')].reset_index()
# rename ID column to Date
new_df.rename(columns={'ID':'Date'}, inplace=True)
# merge your two dataframes together
new_df.merge(review_types, on='Store')
which gives you:
Date Store Score Review Type
0 01/01/18 1 2 Review
1 02/01/18 1 1 Review
2 01/01/18 2 4 Average
3 02/01/18 2 2 Average
4 01/01/18 3 3 Review
5 02/01/18 3 4 Review
Upvotes: 1