Reputation: 339
I'm working with climate data in a pandas DataFrame that is currently in an wide table format where each row represents a year of data for a specific region and the weekly variables are in the column.
Is there a way that I can transform the table such that there is an additional 'week' column and the variable values for each week are listed in the columns?
For instance, my table looks like this currently, where the variable suffix indicates a week number:
ID | Year | precip1 | precip2 | precip3 | max_temp1 | max_temp2 | max_temp3 |
---|---|---|---|---|---|---|---|
1100 | 2000 | 5.3 | 3.0 | 3.1 | 13.3 | 15.3 | 3.1 |
1100 | 2001 | 6.6 | 3.2 | 1.1 | 11.3 | 12.3 | 6.1 |
5903 | 2000 | 3.4 | 0.5 | 2.1 | 10.3 | 18.3 | 8.1 |
5903 | 2001 | 1.7 | 3.8 | 8.1 | 12.3 | 16.3 | 5.1 |
But I want the resulting table to look like this:
ID | Year | Week | precip | max_temp |
---|---|---|---|---|
1100 | 2000 | 1 | 5.3 | 13.3 |
1100 | 2000 | 2 | 3.0 | 15.3 |
1100 | 2000 | 3 | 3.1 | 3.1 |
1100 | 2001 | 1 | 6.6 | 11.3 |
1100 | 2001 | 2 | 3.2 | 12.3 |
1100 | 2001 | 3 | 1.1 | 6.1 |
5903 | 2000 | 1 | 3.4 | 10.3 |
5903 | 2000 | 2 | 0.5 | 18.3 |
5903 | 2000 | 3 | 2.1 | 8.1 |
5903 | 2001 | 1 | 1.7 | 12.3 |
5903 | 2001 | 2 | 3.8 | 16.3 |
5903 | 2001 | 3 | 8.1 | 5.1 |
I've tried using pd.melt()
on the entire DataFrame but the resulting table isn't what I am looking for.
Upvotes: 0
Views: 178
Reputation: 1855
Let's name your initial DataFrame as dfso
. The following code will do exactly what you want using melt
:
# Unpivot precipitation columns
dfp = dfso[["ID", "Year", "precip1", "precip2", "precip3"]].melt(["ID", "Year"], var_name="Week", value_name="precip")
# Clean the Week column
dfp["Week"] = dfp["Week"].str.replace("precip", "")
# Unpivot max temperature columns
dft = dfso[["ID", "Year", "max_temp1", "max_temp2", "max_temp3"]].melt(["ID", "Year"], var_name="Week", value_name="max_temp")
# Clean the Week column
dft["Week"] = dft["Week"].str.replace("max_temp", "")
# Merge both for desired result
result = dfp.merge(dft, on=["ID", "Year", "Week"], how="inner")
Updated:
Use dft["Week"] = dft["Week"].str.replace("max_temp", "")
instead of dft["Week"].apply(lambda x: x.replace("max_temp", ""))
which is faster (this is the max_temp
case, but apply for both transformations). Thanks for the comment in the answer @tdy.
Another option (thanks to the comment on the question) could be:
result = pd.wide_to_long(
dfso,
i=["ID", "Year"],
stubnames=["precip", "max_temp"],
j="Week",
suffix="."
).reset_index()
Upvotes: 2