Reputation: 155

plot the relationship between two variables with pandas

I am new to python but am aware about the usefulness of pandas, thus I would like to kindly ask if someone can help me to use pandas in order to address the below problem.

I have a dataset with buses, which looks like:

BusModel;BusID;ModeName;Value;Unit;UtcTime
Alpha;0001;Engine hours;985;h;2016-06-22 19:58:09.000
Alpha;0001;Engine hours;987;h;2016-06-22 21:58:09.000
Alpha;0001;Engine hours;989;h;2016-06-22 23:59:09.000
Alpha;0001;Fuel consumption;78;l;2016-06-22 19:58:09.000
Alpha;0001;Fuel consumption;88;l;2016-06-22 21:58:09.000
Alpha;0001;Fuel consumption;98;l;2016-06-22 23:59:09.000

The file is .csv format and is separated by semicolon (;). Please note that I would like to plot the relationship between ‘Engine hours’ and ‘Fuel consumption’ by 'calculating the mean value of both for each day' based on the UtcTime. Moreover, I would like to plot graphs for all the busses in the dataset (not only 0001 but also 0002, 0003 etc.). How I can do that with simple loop?

Upvotes: 2

Answers (2)

ysearka

Reputation: 3855

If you really want to use pandas, remember this simple thing: never use a loop. Loops aren't scalable, so try to use built-in functions. First let's read your dataframe:

import pandas as pd
data = pd.read_csv('bus.csv',sep = ';')

Here is the weak point of my answer, I don't know how to manage dates efficently. So create a column named day which contains the day from UtcTime (I would use an apply methode like this data['day'] = data['UtcTime'].apply(lambda x: x[:10]) but it's a hidden loop so don't do that!)

Then to take only the data of a single bus, try a slicing method:

data_bus1 = data[data.BusID == 1]

Finally use the groupby function:

data_bus1[['Modename','Value','day']].groupby(['ModeName','day'],as_index = False).mean()

Or if you don't need to separate your busses in different dataframes, you can use the groupby on the whole data:

data[['BusID','ModeName','Value','day']].groupby(['BusID','ModeName','day'],as_index = False).mean()

Upvotes: 1

Syafiq Kamarul Azman

Reputation: 862

Start with the following interactive mode

import pandas as pd

df = pd.read_csv('bus.csv', sep=";", parse_dates=['UtcTime'])

You should be able to start playing around with the DataFrame and discovering functions you can directly use with the data. To get a list of buses by ID just do:

>>> bus1 = df[df.BusID == 1]
>>> bus1

Substitute 1 with the ID of the bus you require. This will return you a sub-DataFrame. To get BusID 1 and just their engine hours do:

>>> bus1[bus1.ModeName == "Engine hours"]

You can quickly get statistics of columns by doing

>>> bus1.Value.describe()

Once you grouped the data you need you can start plotting:

>>> bus1[bus1.ModeName == "Engine hours"].plot()
>>> bus1[bus1.ModeName == "Fuel consumption"].plot()
>>> plt.show()

There is more explanation on the docs. Please refer to http://pandas.pydata.org/pandas-docs/stable/.

Upvotes: 2

plot the relationship between two variables with pandas

Answers (2)

Related Questions