rbasu98
rbasu98

Reputation: 37

How do I print a simple Python statement based on Pandas dataframe?

Date Train Number Station 1 Station 2 Equipment Available?
2022-06-16 1111 North Central Y
2022-06-20 1111 North Central Y
2022-06-01 2222 North South Y
2022-06-02 2222 North South Y
2022-06-03 2222 North South Y
2022-06-04 2222 North South Y
2022-06-05 2222 North South Y
2022-06-06 2222 North South Y
2022-06-07 2222 North South Y
2022-06-08 2222 North South Y

I have a Pandas dataframe that looks like the one above that is sorted by Train Number and then Date. I would like to print a simple Python statement that says:

"For Train Number 1111 North to Central, we have equipment available on June 16th and June 20th.

For Train Number 2222 North to South, we have equipment available from June 1st to June 8th."

How am I able to do this?????

Upvotes: 1

Views: 238

Answers (3)

Naveed
Naveed

Reputation: 11650

here is one way to do it. Group by Train, station1, station2, taking both min and max of the dates

Finally printing them out from the resulting df from groupby

df2=df.groupby(['TrainNumber', 'Station1', 'Station2'])['Date'].aggregate([min, max]).reset_index()
for idx, row in df2.iterrows():
    print("For Train Number {0} {1} to {2}, we have equipment available on {3} and {4}".format(
          row[0],row[1],row[2],  row[3] , row[4] ))
For Train Number 1111 North to Central, we have equipment available on 2022-06-16 and 2022-06-20
For Train Number 2222 North to South, we have equipment available on 2022-06-01 and 2022-06-08

Upvotes: 0

Drakax
Drakax

Reputation: 1493

I've made a little function which you can call on whatever df you want.

I find this solution more readable and flexible for further requests.

def equip_avail(df):
  for i in df['Train Number'].unique():
    date_start = df.Date.loc[(df['Train Number']==i)].min()
    date_end = df.Date.loc[(df['Train Number']==i)].max()
    from_start = df.Station1.loc[(df['Train Number']==i)].values[0]
    to_end = df.Station2.loc[(df['Train Number']==i)].values[0]
    print(f'For Train Number {i} {from_start} to {to_end}, we have equipment available from {date_start} to {date_end}.')

Then you call it like this:

equip_avail(df)

Result:

For Train Number 1111 North to Central, we have equipment available from 2022-06-16 to 2022-06-20.
For Train Number 2222 North to South, we have equipment available from 2022-06-01 to 2022-06-08.

Upvotes: 1

Mortz
Mortz

Reputation: 4879

You could get the min and max values for each Train's Date with a groupby, dedupe the DataFrame to get the other columns (as they are repeated) and then print the results with some datetime formatting

df.loc[:, 'Date'] = pd.to_datetime(df['Date'])

g = df.groupby(['Train Number']).agg(date_min=pd.NamedAgg(column='Date', aggfunc='min'), date_max=pd.NamedAgg(column='Date', aggfunc='max'))
g = g.join(df_deduped, how='inner')

df_deduped = df.loc[:, 'Train Number':].drop_duplicates().set_index('Train Number')

for index, values in g.reset_index().iterrows():
    print(f'For Train Number {values["Train Number"]}, {values["Station 1"]} to {values["Station 2"]}, we have equipment available from {values["date_min"].strftime("%b %d")} to {values["date_max"].strftime("%b %d")}')

The output is -

For Train Number 1111, North to Central, we have equipment available from Jun 16 to Jun 20
For Train Number 2222, North to South, we have equipment available from Jun 01 to Jun 08

Upvotes: 0

Related Questions