Reputation: 3747
So my dataset has some information by location for n dates. The problem is each date is actually a different column header. For example the CSV looks like
location name Jan-2010 Feb-2010 March-2010
A "test" 12 20 30
B "foo" 18 20 25
What I would like is for it to look like
location name Date Value
A "test" Jan-2010 12
A "test" Feb-2010 20
A "test" March-2010 30
B "foo" Jan-2010 18
B "foo" Feb-2010 20
B "foo" March-2010 25
My problem is I don't know how many dates are in the column (though I know they will always start after name)
Upvotes: 279
Views: 323988
Reputation: 353059
Use .melt
:
df.melt(id_vars=["location", "name"],
var_name="Date",
value_name="Value")
location name Date Value
0 A "test" Jan-2010 12
1 B "foo" Jan-2010 18
2 A "test" Feb-2010 20
3 B "foo" Feb-2010 20
4 A "test" March-2010 30
5 B "foo" March-2010 25
You can use pd.melt
to get most of the way there, and then sort:
>>> df
location name Jan-2010 Feb-2010 March-2010
0 A test 12 20 30
1 B foo 18 20 25
>>> df2 = pd.melt(df,
id_vars=["location", "name"],
var_name="Date",
value_name="Value")
>>> df2
location name Date Value
0 A test Jan-2010 12
1 B foo Jan-2010 18
2 A test Feb-2010 20
3 B foo Feb-2010 20
4 A test March-2010 30
5 B foo March-2010 25
>>> df2 = df2.sort(["location", "name"])
>>> df2
location name Date Value
0 A test Jan-2010 12
2 A test Feb-2010 20
4 A test March-2010 30
1 B foo Jan-2010 18
3 B foo Feb-2010 20
5 B foo March-2010 25
(Might want to throw in a .reset_index(drop=True)
, just to keep the output clean.)
Note: pd.DataFrame.sort
has been deprecated in favour of pd.DataFrame.sort_values
.
Upvotes: 423
Reputation: 832
If you want to swap your rows with columns & columns with rows then try the transpose method of pandas:
df.T
Check the reference link: https://note.nkmk.me/en/python-pandas-t-transpose/
Upvotes: 4
Reputation: 499
Adding a link to a notebook which you can duplicate, demonstrating @DMS's answer using pandas.melt
:
df.melt(id_vars=["location", "name"],
var_name="date",
value_name="value")
Upvotes: 5
Reputation: 862611
Use set_index
with stack
for MultiIndex Series
, then for DataFrame
add reset_index
with rename
:
df1 = (df.set_index(["location", "name"])
.stack()
.reset_index(name='Value')
.rename(columns={'level_2':'Date'}))
print (df1)
location name Date Value
0 A test Jan-2010 12
1 A test Feb-2010 20
2 A test March-2010 30
3 B foo Jan-2010 18
4 B foo Feb-2010 20
5 B foo March-2010 25
Upvotes: 30
Reputation: 164663
pd.wide_to_long
You can add a prefix to your year columns and then feed directly to pd.wide_to_long
. I won't pretend this is efficient, but it may in certain situations be more convenient than pd.melt
, e.g. when your columns already have an appropriate prefix.
df.columns = np.hstack((df.columns[:2], df.columns[2:].map(lambda x: f'Value{x}')))
res = pd.wide_to_long(df, stubnames=['Value'], i='name', j='Date').reset_index()\
.sort_values(['location', 'name'])
print(res)
name Date location Value
0 test Jan-2010 A 12
2 test Feb-2010 A 20
4 test March-2010 A 30
1 foo Jan-2010 B 18
3 foo Feb-2010 B 20
5 foo March-2010 B 25
Upvotes: 10
Reputation: 1186
I guess I found a simpler solution
temp1 = pd.melt(df1, id_vars=["location"], var_name='Date', value_name='Value')
temp2 = pd.melt(df1, id_vars=["name"], var_name='Date', value_name='Value')
Concat whole temp1
with temp2
's column name
temp1['new_column'] = temp2['name']
You now have what you asked for.
Upvotes: 8