Wonton
Wonton

Reputation: 339

Convert yearly wide table to weekly long table

I'm working with climate data in a pandas DataFrame that is currently in an wide table format where each row represents a year of data for a specific region and the weekly variables are in the column.

Is there a way that I can transform the table such that there is an additional 'week' column and the variable values for each week are listed in the columns?


For instance, my table looks like this currently, where the variable suffix indicates a week number:

ID Year precip1 precip2 precip3 max_temp1 max_temp2 max_temp3
1100 2000 5.3 3.0 3.1 13.3 15.3 3.1
1100 2001 6.6 3.2 1.1 11.3 12.3 6.1
5903 2000 3.4 0.5 2.1 10.3 18.3 8.1
5903 2001 1.7 3.8 8.1 12.3 16.3 5.1

But I want the resulting table to look like this:

ID Year Week precip max_temp
1100 2000 1 5.3 13.3
1100 2000 2 3.0 15.3
1100 2000 3 3.1 3.1
1100 2001 1 6.6 11.3
1100 2001 2 3.2 12.3
1100 2001 3 1.1 6.1
5903 2000 1 3.4 10.3
5903 2000 2 0.5 18.3
5903 2000 3 2.1 8.1
5903 2001 1 1.7 12.3
5903 2001 2 3.8 16.3
5903 2001 3 8.1 5.1

I've tried using pd.melt() on the entire DataFrame but the resulting table isn't what I am looking for.

Upvotes: 0

Views: 178

Answers (1)

bruno-uy
bruno-uy

Reputation: 1855

Let's name your initial DataFrame as dfso. The following code will do exactly what you want using melt:

# Unpivot precipitation columns
dfp = dfso[["ID", "Year", "precip1", "precip2", "precip3"]].melt(["ID", "Year"], var_name="Week", value_name="precip")
# Clean the Week column
dfp["Week"] = dfp["Week"].str.replace("precip", "")

# Unpivot max temperature columns
dft = dfso[["ID", "Year", "max_temp1", "max_temp2", "max_temp3"]].melt(["ID", "Year"], var_name="Week", value_name="max_temp")
# Clean the Week column
dft["Week"] = dft["Week"].str.replace("max_temp", "")

# Merge both for desired result
result = dfp.merge(dft, on=["ID", "Year", "Week"], how="inner")

Updated:

Use dft["Week"] = dft["Week"].str.replace("max_temp", "") instead of dft["Week"].apply(lambda x: x.replace("max_temp", "")) which is faster (this is the max_temp case, but apply for both transformations). Thanks for the comment in the answer @tdy.

Another option (thanks to the comment on the question) could be:

result = pd.wide_to_long(
    dfso,
    i=["ID", "Year"],
    stubnames=["precip", "max_temp"],
    j="Week",
    suffix="."
).reset_index()

Upvotes: 2

Related Questions