Reputation: 33
I have a dataset that looks as following:
Date | Time | Variable | Unit | Value |
---|---|---|---|---|
05-02-2021 | 10:00:00 | Freq | Mhz | 10 |
05-02-2021 | 10:00:10 | Freq | Mhz | 13 |
05-02-2021 | 10:00:20 | Freq | Mhz | 12 |
05-02-2021 | 10:00:30 | Freq | Mhz | 15 |
05-02-2021 | 10:00:40 | Freq | Mhz | 11 |
05-02-2021 | 10:00:50 | Freq | Mhz | 13 |
05-02-2021 | 10:01:00 | Freq | Mhz | 11 |
05-02-2021 | 10:00:00 | Distance | m | 102 |
05-02-2021 | 10:00:20 | Distance | m | 107 |
05-02-2021 | 10:00:30 | Distance | m | 110 |
05-02-2021 | 10:00:40 | Distance | m | 108 |
05-02-2021 | 10:00:50 | Distance | m | 104 |
05-02-2021 | 10:01:00 | Distance | m | 107 |
etc | etc | etc | etc | etc |
So I have the variables and its corresponding units are in the same column, however I would like them formatted differently. The time interval at which the different variables (Mhz, Distance) are measured is the same (every 10 seconds), although I would like an option independent of this.
I would like my data to look like the following, so it is easier to work with:
DateTime | Frequency-Mhz | Distance-m |
---|---|---|
05-02-2021 10:00:00 | 10 | 102 |
05-02-2021 10:00:10 | 13 | 107 |
05-02-2021 10:00:20 | 12 | 110 |
05-02-2021 10:00:30 | 15 | 108 |
05-02-2021 10:00:40 | 11 | 104 |
05-02-2021 10:00:50 | 13 | 102 |
05-02-2021 10:01:00 | 11 | 107 |
So far, I have tried various solutions with pd.pitvot_tables, but to no avail. I get close to the solution but not quite yet. I have also tried solutions with .melt or .groupby to built a entire new dataframe, but these are failed.
df['datetime'] = pd.to_datetime(df['Date'] + " " + df['Time'])
df.pivot_table(index=['datetime'], columns='Variable', values='Value')
Upvotes: 2
Views: 311
Reputation: 862731
You are close, need join columns and then convert Value
to numeric, if not matched return NaNs:
df['datetime'] = pd.to_datetime(df['Date'] + " " + df['Time'])
df['Variable'] = df['Variable'] + "-" + df['Unit']
df['Value'] = pd.to_numeric(df['Value'], errors='coerce')
df.pivot(index='datetime', columns='Variable', values='Value')
If possible duplication in datetime
with Variable
use:
df.pivot_table(index='datetime', columns='Variable', values='Value')
Upvotes: 4