Rearrange pandas dataframe

Question

I have a data table that looks like this:

Index | ref_num | year | jan_a | jan_b | jan_c | ... | dec_a | dec_b | dec_c
0     |   100   | 2000 | 10    | 15    |  8    | ... |    3  |  19   |  12
1     |   100   | 2001 | 8     | 18    |  2    | ... |    6  |  15   |  12
2     |   101   | 2000 | 18    | 11    |  0    | ... |    2  |  10   |  11
3     |   101   | 2001 | 9     | 13    |  9    | ... |    4  |  10   |  22

For clarity, the ref_num is a unique identifier. there are multiple years, and all 12 months for each year, with 3 categories per month. I need to get this into a format like:

Index | ref_num | date    | a  | b  | c
0     | 100     | 01/2000 | 10 | 15 | 8
...
11    | 100     | 12/2000 |  3 | 19 | 12
...
23    | 100     | 12/2001 |  6 | 15 | 12

Note, I don't care if this is in pandas or not... it's in a csv file, so if easier to do in numpy or as looping through a file that works too (I've literally been staring at my screen, clueless for hours now).

Henry Yik · Accepted Answer

Reorder the names of the columns so you can use pd.wide_to_long, then combine year and month to construct your desired date format:

df.columns = [i if "_" not in i else "_".join(i.split("_")[::-1]) for i in df.columns]

s = (pd.wide_to_long(df, stubnames=list("abc"),
                     i=["Index", "num_ref", "year"],
                     j="month", sep="_", suffix=".*")
       .reset_index())

s["date"] = pd.to_datetime(s["year"].astype(str)+" "+s["month"]).dt.strftime("%m/%Y")

print (s.drop(["year","month"], 1).rename(columns={"num_ref":"ref_num"}))

   Index  ref_num   a   b   c     date
0      0      100  10  15   8  01/2000
1      0      100   3  19  12  12/2000
2      1      100   8  18   2  01/2001
3      1      100   6  15  12  12/2001
4      2      101  18  11   0  01/2000
5      2      101   2  10  11  12/2000
6      3      101   9  13   9  01/2001
7      3      101   4  10  22  12/2001

Rearrange pandas dataframe

Answers (1)

Related Questions