Reputation: 4989
I want to show a table which combines two DataFrames with conditional indexing. This works with one DataFrame:
room1,weather = pd.read_excel(mypath,sheetnames[0]),pd.read_excel(mypath,sheetnames[2])
selector = (room1.Time>='08:00') & (room1.Time<='18:00')
view = ['Time','Cooling_plant_sensible_load']
room1[selector][view][:12]
which gives me something like this:
Time Cooling_plant_sensible_load
7 08:00 0.000
8 09:00 0.000
....
16 17:00 0.000
17 18:00 0.000
31 08:00 0.000
The weather
DataFrame has a Series called Dry_Bulb_Temperature
which I would like to add to the view so it shows like this
Time Cooling_plant_sensible_load Dry_Bulb_Temperature
7 08:00 0.000 18
8 09:00 0.000 22
....
16 17:00 0.000 19
17 18:00 0.000 16
31 08:00 0.000 12
I tried adding:
selector2 = (weather.Time>='08:00') & (weather.Time<='18:00')
pd.concat({'room1':room1[selector][view][:12],'wea':weather[selector2]['Dry_bulb_temperature']},axis=1)
which gave me a AttributeError: 'Series' object has no attribute '_data'
EDIT:
weather[selector2]['Dry_bulb_temperature'][:12]
looks like this:
major
7 15.3
8 16.0
9 18.0
10 19.9
11 21.9
12 22.9
13 24.0
14 25.0
15 24.8
16 24.5
17 24.3
31 16.2
Name: Dry_bulb_temperature, dtype: float64
EDIT2:
The AttributeError: 'Series' object has no attribute '_data'
is caused because weather[selector2]['Dry_bulb_temperature']
is a Series while concat expects a DataFrame which can not be concat-ed with a Dataframe, i.e. concat needs two similar types (previous comment is wrong as pointed out by @Philip below).
So I could combine the room1 DataFrame with the weather DataFrame. Is this the way to go? How do I avoid that the two 'Time' series are duplicated?
I've got a number of room(n) dataframes and was thinking that there might be a way for each to reference the same weather dataset.
Upvotes: 1
Views: 1134
Reputation: 6606
I'm not sure what is happening in your concat. It might be that you have a field name confused. I see both 'Dry_Bulb_Temperature' and 'Dry_bulb_temperature' in different parts of your question.
Assuming the two dataframes have the same index, I would concat the whole thing, then do your filter:
df = pd.concat([room1, weather[['Dry_Bulb_Temperature']]], axis=1)
df[(df['Time'] >= '08:00') & (df['Time'] <= '18:00')]
Less code and easier to read.
Upvotes: 1
Reputation: 375695
It looks like you want to do a join (which can merge a DataFrame and a Series on their index):
In [11]: df
Out[11]:
Time Cooling_plant_sensible_load Dry_Bulb_Temperature
7 08:00 0 18
8 09:00 0 22
In [12]: s
Out[12]:
7 15.3
8 16.0
Name: Dry_bulb_temperature, dtype: float64
In [13]: df.join(s)
Out[13]:
Time Cooling_plant_sensible_load Dry_Bulb_Temperature Dry_bulb_temperature
7 08:00 0 18 15.3
8 09:00 0 22 16.0
See more in the merging, join and concating section of the docs.
Note:
You can create the Series/column using loc, avoiding chaining:
s = weather.loc[selector2, 'Dry_bulb_temperature']
Upvotes: 1
Reputation: 4989
Ok, I got something that works based on @mattexx initial proposal:
#pd.concat([room1, weather], axis=1)[selector][view.append('Dry_bulb_temperature')]
df = pd.concat([room1, weather], axis=1)
# Removing duplicate columns based on this link:
# http://stackoverflow.com/questions/16938441/how-to-remove-duplicate-columns-from-a-dataframe-using-python-pandas
df = df.T.groupby(level=0).first().T
selector = [(df.Time>='08:00') & (df.Time<='18:00')]
view = ['Time','Cooling_plant_sensible_load','Dry_bulb_temperature']
df[['Time','Cooling_plant_sensible_load','Dry_bulb_temperature']][(df.Time>='08:00') & (df.Time<='18:00')][:12]
which gives:
Time Cooling_plant_sensible_load Dry_bulb_temperature
7 08:00 0 15.3
8 09:00 0 16
.......................
17 18:00 0 24.3
31 08:00 0 16.2
Not sure if this is the best way to get there, but it works for now. Thanks guys for getting me on the right track.
Upvotes: 0