Reputation: 541
I have Two Dataframes ,What I'm trying to to do , is to create new columns based on the Sensor_Type and add them to DF1 and assign each measurement based on the Sensor.
DF1:
Weather_Legends.head():
Sensor_ID Sensor_Street_Name Sensor_Lat Sensor_Long Sensor_Type UOM
0 6030 Milano - via Brera 45.471192 9.187616 Wind Direction degree
1 5897 Milano - via Brera 45.471192 9.187616 Temperature Celsius degree
2 6174 Milano - via Brera 45.471192 9.187616 Relative Humidity %
3 6120 Milano - via Brera 45.471192 9.187616 Wind Speed m/s
4 2006 Milano - via Lambrate 45.490051 9.22559 Precipitation mm
DF2:
Mi_Meteo.head():
Sensor_ID Time_Instant Measurement
0 14121 2013-11-01 01:00:00 0.8
1 14121 2013-11-01 02:00:00 0.6
2 14121 2013-11-01 03:00:00 0.4
3 14121 2013-11-01 04:00:00 0.4
4 14121 2013-11-01 05:00:00 0
And this is the desired output:
Sensor_Type Sensor_ID Sensor_Street_Name Time_Instant Precipitation Relative Humidity Wind Speed …..
0 14121 Milano - via Ippolito Rosellini 2013-11-01 01:00:00 0.8 NaN NaN
1 14121 Milano - via Ippolito Rosellin 2013-11-01 02:00:00 NaN 0.6 NaN
2 14121 Milano - via Ippolito Rosellini 2013-11-01 03:00:00 0.4 NaN NaN
.
.
.
And Instead , These Is What I get:
Sensor_Type Sensor_ID Sensor_Street_Name Time_Instant Precipitation
0 14121 Milano - via Ippolito Rosellini 2013-11-01 01:00:00 0.8
1 14121 Milano - via Ippolito Rosellini 2013-11-01 02:00:00 0.6
2 14121 Milano - via Ippolito Rosellini 2013-11-01 03:00:00 0.4
.
.
The Other Sensor Type are Missing !!!
And Here's The Code That I Used:
Mi_Meteo['Measurement'] = Mi_Meteo['Measurement'].str.rstrip(' Measure').str.strip()
Mi_Meteo['Measurement'] = pd.to_numeric(Mi_Meteo['Measurement'] ,errors='coerce' )
Mi_Meteo['Sensor_ID'] = Mi_Meteo['Sensor_ID'].str.rstrip(' ID').str.strip()
Mi_Meteo['Sensor_ID'] = pd.to_numeric(Mi_Meteo['Sensor_ID'] ,errors='coerce' )
Mi_Meteo['Measurement'] = Mi_Meteo['Measurement'].astype(float)
Mi_Meteo['Sensor_ID'] = Mi_Meteo['Sensor_ID'].astype(float)
df4 = Mi_Meteo.merge(Weather_Legends, on='Sensor_ID', how='left')\
.pivot_table(index=['Sensor_ID' ,'Sensor_Street_Name' , 'Time_Instant' ],
values= 'Measurement',
columns='Sensor_Type')\
.reset_index()
df4['Sensor_ID'] = df4['Sensor_ID'].astype(int)
Any Suggestion Would Be Much Appreciated , Thank U all.
Upvotes: 1
Views: 52
Reputation: 8970
In the absence of your data, I made it up with some random data which mirrors what I understand is the structure of yours.
The code below works and gives you what I believe you're looking for. It is similar to what you seem to have done, so I can only guess there must be some errors in the structure of your data.
This is going to sound really banal, but have you verified that there are readings for the other sensors, too (wind etc)? Based on the output, it seems there are readings only for precipitation.
Your 'desired output' doesn't make sense because it shows that the same sensor id records rain and humidity; but you said each sensor id is associated to only one sensor type!
E.g. maybe 'sensor id' is not a primary key after the formulae you have applied? Look up these concepts here: https://www.essentialsql.com/what-is-the-difference-between-a-primary-key-and-a-foreign-key/ or in one of the trillion resources available online. Check the structure of your data.
Don't mean to sound like a broken record, but, again: primary and foreign keys! You really really really really need to understand the structure of your data.
If, after going over these concepts and after reviewing your data, you still can't see what's wrong with it, you might try uploading the data to dropbox or something similar, posting a link here and hoping some of us have enough time to go over it.
import numpy as np
import pandas as pd
import random
num_sensors=int(100)
sensors= pd.DataFrame()
sensors['sensor id']=np.arange(0,num_sensors)
sensors['address id'] =np.arange(1000,1000+num_sensors)
#;ambda function not efficient on large datasets but irrelevant here
sensors['type']= sensors.apply( lambda x: "".join( [random.choice(['rain','temp','wind']) ] ), axis=1 )
num_measurements = 10
meas = pd.DataFrame()
meas['sensor id']= np.repeat(sensors['sensor id'], num_measurements )
meas['time'] = np.tile( np.arange(0,num_measurements ), num_sensors )
meas['value'] =np.random.rand(num_measurements * num_sensors )
#otherwise the index is copied from the other dataframe, so is not unique
meas=meas.reset_index(drop=True)
joined = pd.merge(sensors, meas, how='outer', on='sensor id')
pt = joined.pivot_table( index= ['sensor id','address id','time'], columns=['type'], values=['value'] ).reset_index()
Upvotes: 2