Reputation: 605
My dataframe looks like this
>df
ds A B C
01/01/2010 4 2 1
02/01/2010 2 9 3
03/01/2010 1 3 0
Where A & B belong to Category 1 and C is Category 2.
I want to convert it into:
ds Category Company Value
01/01/2010 1 A 4
01/01/2010 1 B 2
01/01/2010 2 C 1
and so on, for plotting later on.
Upvotes: 1
Views: 53
Reputation: 2032
We can use pd.melt followed by np.where :
df2 = pd.melt(df, id_vars=['ds'], value_vars=['A', 'B', 'C'])
df2['Category'] = np.where((df2['variable'] == 'A') | (df2['variable'] == 'B'), 1, 2)
Upvotes: 1
Reputation: 863226
Use DataFrame.melt
:
df['ds'] = pd.to_datetime(df['ds'], format='%d/%m/%Y')
df = df.melt('ds', var_name='Company')
If multiple categories is possible create dictionary and create new column by Series.map
:
d = {1:['A','B'], 2:['C']}
#swap key values in dict
#http://stackoverflow.com/a/31674731/2901002
d1 = {k: oldk for oldk, oldv in d.items() for k in oldv}
df['Category'] = df['Company'].map(d1)
#alternative1
#df['Category'] = np.where(df['Company'] == 'C', 2, 1)
#alternative2
#df['Category'] = np.where(df['Company'].isin(['A','B']), 1, 2)
df = df.sort_values(['ds','Company']).reset_index(drop=True)
Or DataFrame.set_index
with DataFrame.stack
:
df['ds'] = pd.to_datetime(df['ds'], format='%d/%m/%Y')
df = df.set_index('ds').stack().rename_axis(('ds','Company')).reset_index(name='value')
df['Category'] = np.where(df['Company'] == 'C', 2, 1)
print (df)
ds Company value Category
0 2010-01-01 A 4 1
1 2010-01-01 B 2 1
2 2010-01-01 C 1 2
3 2010-01-02 A 2 1
4 2010-01-02 B 9 1
5 2010-01-02 C 3 2
6 2010-01-03 A 1 1
7 2010-01-03 B 3 1
8 2010-01-03 C 0 2
Upvotes: 2