Reputation: 538
I have a data table that looks like this:
Index | ref_num | year | jan_a | jan_b | jan_c | ... | dec_a | dec_b | dec_c
0 | 100 | 2000 | 10 | 15 | 8 | ... | 3 | 19 | 12
1 | 100 | 2001 | 8 | 18 | 2 | ... | 6 | 15 | 12
2 | 101 | 2000 | 18 | 11 | 0 | ... | 2 | 10 | 11
3 | 101 | 2001 | 9 | 13 | 9 | ... | 4 | 10 | 22
For clarity, the ref_num
is a unique identifier. there are multiple years, and all 12 months for each year, with 3 categories per month. I need to get this into a format like:
Index | ref_num | date | a | b | c
0 | 100 | 01/2000 | 10 | 15 | 8
...
11 | 100 | 12/2000 | 3 | 19 | 12
...
23 | 100 | 12/2001 | 6 | 15 | 12
Note, I don't care if this is in pandas or not... it's in a csv file, so if easier to do in numpy or as looping through a file that works too (I've literally been staring at my screen, clueless for hours now).
Upvotes: 0
Views: 42
Reputation: 22493
Reorder the names of the columns so you can use pd.wide_to_long
, then combine year and month to construct your desired date format:
df.columns = [i if "_" not in i else "_".join(i.split("_")[::-1]) for i in df.columns]
s = (pd.wide_to_long(df, stubnames=list("abc"),
i=["Index", "num_ref", "year"],
j="month", sep="_", suffix=".*")
.reset_index())
s["date"] = pd.to_datetime(s["year"].astype(str)+" "+s["month"]).dt.strftime("%m/%Y")
print (s.drop(["year","month"], 1).rename(columns={"num_ref":"ref_num"}))
Index ref_num a b c date
0 0 100 10 15 8 01/2000
1 0 100 3 19 12 12/2000
2 1 100 8 18 2 01/2001
3 1 100 6 15 12 12/2001
4 2 101 18 11 0 01/2000
5 2 101 2 10 11 12/2000
6 3 101 9 13 9 01/2001
7 3 101 4 10 22 12/2001
Upvotes: 1