Reputation: 155
I am new to python but am aware about the usefulness of pandas, thus I would like to kindly ask if someone can help me to use pandas in order to address the below problem.
I have a dataset with buses, which looks like:
BusModel;BusID;ModeName;Value;Unit;UtcTime
Alpha;0001;Engine hours;985;h;2016-06-22 19:58:09.000
Alpha;0001;Engine hours;987;h;2016-06-22 21:58:09.000
Alpha;0001;Engine hours;989;h;2016-06-22 23:59:09.000
Alpha;0001;Fuel consumption;78;l;2016-06-22 19:58:09.000
Alpha;0001;Fuel consumption;88;l;2016-06-22 21:58:09.000
Alpha;0001;Fuel consumption;98;l;2016-06-22 23:59:09.000
The file is .csv format and is separated by semicolon (;). Please note that I would like to plot the relationship between ‘Engine hours’ and ‘Fuel consumption’ by 'calculating the mean value of both for each day' based on the UtcTime. Moreover, I would like to plot graphs for all the busses in the dataset (not only 0001 but also 0002, 0003 etc.). How I can do that with simple loop?
Upvotes: 2
Views: 3412
Reputation: 3855
If you really want to use pandas, remember this simple thing: never use a loop. Loops aren't scalable, so try to use built-in functions. First let's read your dataframe:
import pandas as pd
data = pd.read_csv('bus.csv',sep = ';')
Here is the weak point of my answer, I don't know how to manage dates efficently. So create a column named day
which contains the day from UtcTime
(I would use an apply methode like this data['day'] = data['UtcTime'].apply(lambda x: x[:10])
but it's a hidden loop so don't do that!)
Then to take only the data of a single bus, try a slicing method:
data_bus1 = data[data.BusID == 1]
Finally use the groupby function:
data_bus1[['Modename','Value','day']].groupby(['ModeName','day'],as_index = False).mean()
Or if you don't need to separate your busses in different dataframes, you can use the groupby
on the whole data:
data[['BusID','ModeName','Value','day']].groupby(['BusID','ModeName','day'],as_index = False).mean()
Upvotes: 1
Reputation: 862
Start with the following interactive mode
import pandas as pd
df = pd.read_csv('bus.csv', sep=";", parse_dates=['UtcTime'])
You should be able to start playing around with the DataFrame
and discovering functions you can directly use with the data. To get a list of buses by ID just do:
>>> bus1 = df[df.BusID == 1]
>>> bus1
Substitute 1 with the ID of the bus you require. This will return you a sub-DataFrame
. To get BusID 1 and just their engine hours do:
>>> bus1[bus1.ModeName == "Engine hours"]
You can quickly get statistics of columns by doing
>>> bus1.Value.describe()
Once you grouped the data you need you can start plotting:
>>> bus1[bus1.ModeName == "Engine hours"].plot()
>>> bus1[bus1.ModeName == "Fuel consumption"].plot()
>>> plt.show()
There is more explanation on the docs. Please refer to http://pandas.pydata.org/pandas-docs/stable/.
Upvotes: 2