Reputation: 1075
I'm trying to gather a pandas DataFrame column into a key value pairs and list it as a row in python. If we take the following DataFrame as example, I want to go from here:
import pandas as pd
from collections import OrderedDict
df = pd.DataFrame({'value_2016': [200],
'value_2017': [300],
'value_2018': [float('NaN')]})
print(df)
value_2016 value_2017 value_2018
0 200 300 NaN
to:
df_result = pd.DataFrame(OrderedDict({'year': [2016, 2017],
'value': [200, 300]}))
print(df_result)
year value
0 2016 200
1 2017 300
If you are familiar in R the equivalent would be something like this:
require("plyr"); require("dplyr"); require(tidyr)
df <- data.frame(value_2016 = 200,
value_2017 = 300,
value_2018 = NA)
df %>%
gather(year, value, value_2016:value_2018) %>%
mutate(year = gsub(x = .$year, replacement = "", "value_")) %>%
na.exclude
year value
1 2016 200
2 2017 300
Any help would be very cool!
Upvotes: 4
Views: 2703
Reputation: 3825
Or using datar
:
>>> from datar.all import f, NA, tribble, pivot_longer, everything, drop_na
>>>
>>> df = tribble(
... f.value_2016, f.value_2017, f.value_2018,
... 200, 300, NA
... )
>>> df
value_2016 value_2017 value_2018
<int64> <int64> <float64>
0 200 300 NaN
>>>
>>> pivot_longer(df, everything()) >> drop_na()
name value
<object> <float64>
0 value_2016 200.0
1 value_2017 300.0
Upvotes: 0
Reputation: 49
Another solution using melt:
ipdb> pd.melt(df.rename(columns=lambda x: x.split('_')[-1]), var_name="year", value_name="value").dropna()
year value
0 2016 200.0
1 2017 300.0
Upvotes: 2
Reputation: 76927
You could use rename
, stack
and reset_index
In [4912]: (df.rename(columns=lambda x: x.split('_')[-1]).stack()
.reset_index(level=0, drop=True)
.rename_axis('year')
.reset_index(name='value'))
Out[4912]:
year value
0 2016 200.0
1 2017 300.0
Upvotes: 0
Reputation: 862731
You can create MultiIndex
by split
and then reshape by stack
:
df.columns = df.columns.str.split('_', expand=True)
df = df.stack().reset_index(level=0, drop=True).rename_axis('year').reset_index()
#if necessary convert float to int
df.value = df.value.astype(int)
print (df)
year value
0 2016 200
1 2017 300
If want use DataFrame
constructor use get_level_values
:
df.columns = df.columns.str.split('_', expand=True)
df = df.stack()
df_result = pd.DataFrame(OrderedDict({'year': df.index.get_level_values(1),
'value': df['value'].astype(int).values}))
print(df_result)
year value
0 2016 200
1 2017 300
Upvotes: 1