Dimi
Dimi

Reputation: 541

How to overcome the Wrong merging When using Pivot_Table?

I have Two Dataframes ,What I'm trying to to do , is to create new columns based on the Sensor_Type and add them to DF1 and assign each measurement based on the Sensor.

DF1:

  Weather_Legends.head():

  Sensor_ID Sensor_Street_Name  Sensor_Lat  Sensor_Long   Sensor_Type         UOM
 0  6030    Milano - via Brera  45.471192   9.187616    Wind Direction      degree
 1  5897    Milano - via Brera  45.471192   9.187616      Temperature   Celsius degree
 2  6174    Milano - via Brera  45.471192   9.187616    Relative Humidity      %
 3  6120    Milano - via Brera  45.471192   9.187616      Wind Speed         m/s
 4  2006    Milano - via Lambrate  45.490051  9.22559   Precipitation        mm

DF2:

  Mi_Meteo.head():

  Sensor_ID    Time_Instant     Measurement
0   14121   2013-11-01 01:00:00   0.8
1   14121   2013-11-01 02:00:00   0.6
2   14121   2013-11-01 03:00:00   0.4
3   14121   2013-11-01 04:00:00   0.4
4   14121   2013-11-01 05:00:00    0

And this is the desired output:

Sensor_Type Sensor_ID   Sensor_Street_Name             Time_Instant      Precipitation     Relative Humidity    Wind Speed  …..
     0      14121     Milano - via Ippolito Rosellini   2013-11-01 01:00:00   0.8              NaN              NaN
     1      14121     Milano - via Ippolito Rosellin    2013-11-01 02:00:00   NaN              0.6              NaN
     2      14121     Milano - via Ippolito Rosellini   2013-11-01 03:00:00   0.4               NaN             NaN
     .
     .
     .

And Instead , These Is What I get:

 Sensor_Type Sensor_ID  Sensor_Street_Name               Time_Instant     Precipitation
   0          14121    Milano - via Ippolito Rosellini   2013-11-01 01:00:00    0.8
   1          14121    Milano - via Ippolito Rosellini   2013-11-01 02:00:00    0.6
   2          14121    Milano - via Ippolito Rosellini   2013-11-01 03:00:00    0.4
   .
   .

The Other Sensor Type are Missing !!!

And Here's The Code That I Used:

    Mi_Meteo['Measurement'] = Mi_Meteo['Measurement'].str.rstrip(' Measure').str.strip()
    Mi_Meteo['Measurement'] = pd.to_numeric(Mi_Meteo['Measurement'] ,errors='coerce' )
    Mi_Meteo['Sensor_ID'] = Mi_Meteo['Sensor_ID'].str.rstrip(' ID').str.strip()
    Mi_Meteo['Sensor_ID'] = pd.to_numeric(Mi_Meteo['Sensor_ID'] ,errors='coerce' )

    Mi_Meteo['Measurement'] = Mi_Meteo['Measurement'].astype(float)
    Mi_Meteo['Sensor_ID'] = Mi_Meteo['Sensor_ID'].astype(float)

    df4 = Mi_Meteo.merge(Weather_Legends, on='Sensor_ID', how='left')\
                  .pivot_table(index=['Sensor_ID' ,'Sensor_Street_Name' , 'Time_Instant' ], 
                               values= 'Measurement', 
                               columns='Sensor_Type')\
                  .reset_index()
    df4['Sensor_ID'] = df4['Sensor_ID'].astype(int)

Any Suggestion Would Be Much Appreciated , Thank U all.

Upvotes: 1

Views: 52

Answers (1)

Pythonista anonymous
Pythonista anonymous

Reputation: 8970

In the absence of your data, I made it up with some random data which mirrors what I understand is the structure of yours.

The code below works and gives you what I believe you're looking for. It is similar to what you seem to have done, so I can only guess there must be some errors in the structure of your data.

This is going to sound really banal, but have you verified that there are readings for the other sensors, too (wind etc)? Based on the output, it seems there are readings only for precipitation.

Your 'desired output' doesn't make sense because it shows that the same sensor id records rain and humidity; but you said each sensor id is associated to only one sensor type!

E.g. maybe 'sensor id' is not a primary key after the formulae you have applied? Look up these concepts here: https://www.essentialsql.com/what-is-the-difference-between-a-primary-key-and-a-foreign-key/ or in one of the trillion resources available online. Check the structure of your data.

Don't mean to sound like a broken record, but, again: primary and foreign keys! You really really really really need to understand the structure of your data.

If, after going over these concepts and after reviewing your data, you still can't see what's wrong with it, you might try uploading the data to dropbox or something similar, posting a link here and hoping some of us have enough time to go over it.

import numpy as np
import pandas as pd
import random

num_sensors=int(100)
sensors= pd.DataFrame()
sensors['sensor id']=np.arange(0,num_sensors)
sensors['address id'] =np.arange(1000,1000+num_sensors)
#;ambda function not efficient on large datasets but irrelevant here
sensors['type']= sensors.apply( lambda x: "".join( [random.choice(['rain','temp','wind']) ] ), axis=1 )

num_measurements = 10
meas = pd.DataFrame()
meas['sensor id']= np.repeat(sensors['sensor id'], num_measurements )
meas['time'] = np.tile( np.arange(0,num_measurements ), num_sensors )
meas['value'] =np.random.rand(num_measurements * num_sensors )
#otherwise the index is copied from the other dataframe, so is not unique
meas=meas.reset_index(drop=True)

joined = pd.merge(sensors, meas, how='outer', on='sensor id')
pt = joined.pivot_table( index= ['sensor id','address id','time'], columns=['type'], values=['value']  ).reset_index()

Upvotes: 2

Related Questions