Chinmay
Chinmay

Reputation: 145

Convert multiple row data into single column

I have a dataframe

     year month     v_0     v_1   ...      v_27    v_28    v_29    v_30
0    1919    03     0.0     0.0   ...       0.0    13.0     0.0     0.0
1    1919    04     5.0     0.0   ...       0.0     0.0     0.0 -9999.0
2    1919    05     0.0     0.0   ...       0.0     0.0     0.0     0.0

The headers are year,month and every day in the month. I want to convert the seperate headers for every day into a single column which would be the day number for every month followed by the data. It should look something like this.

Year         Month  Day  Value

1919           3     1    0

1919           3     2    0

Upvotes: 2

Views: 132

Answers (2)

rje
rje

Reputation: 6438

You want to use DataFrame.melt():

Considering the following dataframe df:

        Year    Month   v_1     v_2
  0     1901    2       4       8
  1     1902    3       5       9
  2     1903    4       6       10
  3     1904    5       7       11

Calling df.melt(id_vars=['Year', 'Month'], var_name='Day') yields this result:

       Year     Month   Day         value
  0     1901    2       v_1             4
  1     1902    3       v_1             5
  2     1903    4       v_1             6
  3     1904    5       v_1             7
  4     1901    2       v_2             8
  5     1902    3       v_2             9
  6     1903    4       v_2             10
  7     1904    5       v_2             11

What happens here is that melt moves all data from the columns other than Year and Month (the id_vars), into a new column "value", and the column names (v_0, v_1 etc) will become a new column called "variable". We can set the name of this new column with the var_name argument, which I have set to 'Day' above.

Actually, because the names of the day columns begin with "v_", I would start by renaming these columns:

df.rename(axis='columns', mapper=lambda s: s.split('_')[-1], inplace=True)
df.melt(id_vars=['Year', 'Month'], var_name='Day')

Upvotes: 1

BENY
BENY

Reputation: 323376

You may need to check wide_to_long

pd.wide_to_long(df,'v',i=['year','month'],j='day',sep='_').reset_index()
Out[108]: 
    year  month day       v
0   1919      3   0     0.0
1   1919      3   1     0.0
2   1919      3  27     0.0
3   1919      3  28    13.0
4   1919      3  29     0.0
5   1919      3  30     0.0
6   1919      4   0     5.0
7   1919      4   1     0.0
8   1919      4  27     0.0
9   1919      4  28     0.0
10  1919      4  29     0.0
11  1919      4  30 -9999.0
12  1919      5   0     0.0
13  1919      5   1     0.0
14  1919      5  27     0.0
15  1919      5  28     0.0
16  1919      5  29     0.0
17  1919      5  30     0.0

Upvotes: 2

Related Questions