Reshaping DataFrame and modifying a column based on other 24 columns

Question

I have an excel which has 26 columns.

Date, Unique ID and H01,H02,H03...H24

Here H{n} represents the hour, i.e. For UID some_code at 19/7/2017 01.00.00 the value was 199. At 19/7/2017 02.00.00 the value was 7 etc.

+--------------------+---------------+----------+---------------+
|       Date         | UID           | H01      | H02           |
+--------------------+---------------+----------+---------------+
| 19/7/2017 00.00.00 | some_code     |      199 |             7 |
| 19/7/2017 00.00.00 | another_code  |      164 |            18 |
| 19/7/2017 00.00.00 | new_code      |      209 |             1 |
| 19/7/2017 00.00.00 | code_5        |       85 |             4 |
| 19/7/2017 00.00.00 | what          |       45 |             6 |

I'm reading the excel and creating a DataFrame which looks similar to above.

I want to modify this DataFrame such that I get below.

+--------------------+---------------+----------+
|       Date         | UID           | Value    |
+--------------------+---------------+----------+
| 19/7/2017 01.00.00 | some_code     |      199 |
| 19/7/2017 02.00.00 | some_code     |        7 |
| 19/7/2017 03.00.00 | some_code     |      ... |
.................................................
.................................................
| 19/7/2017 00.00.00 | some_code     |      ... |
| 19/7/2017 01.00.00 | another_code  |      164 |
| 19/7/2017 02.00.00 | another_code  |       18 |
| 19/7/2017 03.00.00 | another_code  |       ...|
.................................................
.................................................
| 19/7/2017 00.00.00 | another_code  |       ...|

I'm new to Python and Pandas, unable to get my head around stack/unstack/pivot.

jezrael · Accepted Answer

You can use:

df['Date'] = pd.to_datetime(df['Date'], format='%d/%m/%Y %H.%M.%S')
df = df.set_index(['Date','UID'])
df.columns=pd.to_timedelta(df.columns.str.extract('(\d+)',expand=False).astype(int),unit='H')
df = df.stack().reset_index(name='Value')
df['Date'] = df['Date'] + df['level_2']
df = df.drop('level_2', axis=1)
print (df)
                 Date           UID  Value
0 2017-07-19 01:00:00     some_code    199
1 2017-07-19 02:00:00     some_code      7
2 2017-07-19 01:00:00  another_code    164
3 2017-07-19 02:00:00  another_code     18
4 2017-07-19 01:00:00      new_code    209
5 2017-07-19 02:00:00      new_code      1
6 2017-07-19 01:00:00        code_5     85
7 2017-07-19 02:00:00        code_5      4
8 2017-07-19 01:00:00          what     45
9 2017-07-19 02:00:00          what      6

For same format of dates add dt.strftime:

...
df['Date'] = (df['Date'] + df['level_2']).dt.strftime('%d/%m/%Y %H.%M.%S')
df = df.drop('level_2', axis=1)
print (df)
                  Date           UID  Value
0  19/07/2017 01.00.00     some_code    199
1  19/07/2017 02.00.00     some_code      7
2  19/07/2017 01.00.00  another_code    164
3  19/07/2017 02.00.00  another_code     18
4  19/07/2017 01.00.00      new_code    209
5  19/07/2017 02.00.00      new_code      1
6  19/07/2017 01.00.00        code_5     85
7  19/07/2017 02.00.00        code_5      4
8  19/07/2017 01.00.00          what     45
9  19/07/2017 02.00.00          what      6

Reshaping DataFrame and modifying a column based on other 24 columns

Answers (1)

Related Questions