Reputation: 1075
I've just placed a similar question here and got an answer but recognised, that by adding a new column to a DataFrame the presented solution fails as the problem is a bit different.
I want to go from here:
import pandas as pd
df = pd.DataFrame({'ID': [1, 2],
'Value_2013': [100, 200],
'Value_2014': [245, 300],
'Value_2016': [200, float('NaN')]})
print(df)
ID Value_2013 Value_2014 Value_2016
0 1 100 245 200.0
1 2 200 300 NaN
to:
df_new = pd.DataFrame({'ID': [1, 1, 1, 2, 2],
'Year': [2013, 2014, 2016, 2013, 2014],
'Value': [100, 245, 200, 200, 300]})
print(df_new)
ID Value Year
0 1 100 2013
1 1 245 2014
2 1 200 2016
3 2 200 2013
4 2 300 2014
Any ideas how I can face this challenge?
Upvotes: 4
Views: 5034
Reputation: 28709
One option is with the pivot_longer function from pyjanitor, using the .value
placeholder:
# pip install pyjanitor
import pandas as pd
import janitor
(df
.pivot_longer(
index = "ID",
names_to=(".value", "Year"),
names_sep="_",
sort_by_appearance=True)
.dropna()
)
ID Year Value
0 1 2013 100.0
1 1 2014 245.0
2 1 2016 200.0
3 2 2013 200.0
4 2 2014 300.0
The .value
keeps the part of the column associated with it as header, while the rest goes into the Year
column.
Upvotes: 0
Reputation: 210882
Yet another soution (two steps):
In [31]: x = df.set_index('ID').stack().astype(int).reset_index(name='Value')
In [32]: x
Out[32]:
ID level_1 Value
0 1 Value_2013 100
1 1 Value_2014 245
2 1 Value_2016 200
3 2 Value_2013 200
4 2 Value_2014 300
In [33]: x = x.assign(Year=x.pop('level_1').str.extract(r'(\d{4})', expand=False))
In [34]: x
Out[34]:
ID Value Year
0 1 100 2013
1 1 245 2014
2 1 200 2016
3 2 200 2013
4 2 300 2014
Upvotes: 0
Reputation: 6322
The pandas.melt()
method gets you halfway there. After that it's just some minor cleaning up.
df = pd.melt(df, id_vars='ID', var_name='Year', value_name='Value')
df['Year'] = df['Year'].map(lambda x: x.split('_')[1])
df = df.dropna().astype(int).sort_values(['ID', 'Year']).reset_index(drop=True)
df = df.reindex_axis(['ID', 'Value', 'Year'], axis=1)
print(df)
ID Value Year
0 1 100 2013
1 1 245 2014
2 1 200 2016
3 2 200 2013
4 2 300 2014
Upvotes: 4
Reputation: 1345
Another option is pd.wide_to_long(). Admittedly it doesn't give you exactly the same output but you can clean up as needed.
pd.wide_to_long(df, ['Value_',], i='', j='Year')
ID Value_
Year
NaN 2013 1 100
2013 2 200
2014 1 245
2014 2 300
2016 1 200
2016 2 NaN
Upvotes: 1
Reputation: 1062
import numpy as np
import pandas as pd
from collections import OrderedDict
df = pd.DataFrame({'ID': [1, 2],
'Value_2013': [100, 200],
'Value_2014': [245, 300],
'Value_2016': [200, float('NaN')]})
# Set ID column as Index
df = df.set_index('ID')
# unstack all columns, swap the levels in the row index
# and convert series to df
df = df.unstack().swaplevel().to_frame().reset_index()
# Rename columns as desired
df.columns = ['ID', 'Year', 'Value']
# Transform the year values from Value_2013 --> 2013 and so on
df['Year'] = df['Year'].apply(lambda x : x.split('_')[1]).astype(np.int)
# Sort by ID
df = df.sort_values(by='ID').reset_index(drop=True).dropna()
print(df)
ID Year Value
0 1 2013 100.0
1 1 2014 245.0
2 1 2016 200.0
3 2 2013 200.0
4 2 2014 300.0
Upvotes: 1
Reputation: 863166
You need add set_index
first:
df = df.set_index('ID')
df.columns = df.columns.str.split('_', expand=True)
df = df.stack().rename_axis(['ID','Year']).reset_index()
df.Value = df.Value.astype(int)
#if order of columns is important
df = df.reindex_axis(['ID','Value','Year'], axis=1)
print (df)
ID Value Year
0 1 100 2013
1 1 245 2014
2 1 200 2016
3 2 200 2013
4 2 300 2014
Upvotes: 3