Reputation:
I have a text file that contains tide data, the first few lines are:
Thursday 4 January,11.58,0.38 meters
Thursday 4 January,16.95,0.73 meters
Thursday 4 January,23.68,0.02 meters
Friday 5 January,6.48,0.83 meters
Friday 5 January,12.42,0.33 meters
etc. and it continues for 5 more days of data.
I've turned this data into a nested list so it now prints as:
[['Thursday 4 January', 11.58, 0.38], ['Thursday 4 January', 16.95, 0.73], ['Thursday 4 January', 23.68, 0.02], ['Friday 5 January', 6.48, 0.83], ['Friday 5 January', 12.42, 0.33].....]
And so on for every line of data in the file, with the first index of each list being the day in a string, and the 2nd and 3rd index being floats.
For each nested list with a matching [0] index (the day), I need to find the lowest and highest float of each matching nested lists [2] index (the tide height) and print these to the screen for each day.
for example:
Thursday 4 January: 0.02 meters at lowest and 0.73 meters at highest
Friday 5 January: 0.33 meters at lowest and 0.83 meters at highest
Since the lists are in order by day already, I was thinking to just count the matching names and then compare it manually, e.g. for 'Thursday 4 January', since there are 3 of them, I just compare split_tides[0-2][2] and find the min and max, and repeat for each day. But I thought there must be a way to automatically compare nested lists that have a matching string.
Thanks for the help.
EDIT: Image of text file
Upvotes: 1
Views: 71
Reputation: 1930
Here below is the snippet which would help you
from pandas import DataFrame
data = [['Thursday 4 January', 11.58, 0.38], ['Thursday 4 January', 16.95, 0.73], ['Thursday 4 January', 23.68, 0.02],
['Friday 5 January', 6.48, 0.83], ['Friday 5 January', 12.42, 0.33]]
df = DataFrame.from_records(data)
df.columns = ["Date", "Value", "Height"]
df.groupby(['Date'])['Height'].max()
df.groupby(['Date'])['Height'].min()
Output:
Date Friday 5 January 0.83 Thursday 4 January 0.73
Date Friday 5 January 0.33 Thursday 4 January 0.02
Upvotes: 2
Reputation: 11228
data = [['Thursday 4 January', 11.58, 0.38], ['Thursday 4 January', 16.95, 0.73], ['Thursday 4 January', 23.68, 0.02], ['Friday 5 January', 6.48, 0.83], ['Friday 5 January', 12.42, 0.33]]
dic={'day':[],'min':[],'max':[]}
for i in data :
dic['day']+=[i[0]]
dic['min']+=[i[1]]
dic['max']+=[i[2]]
from collections import defaultdict
x = defaultdict(dict)
for i in dic['day']:
x[i]={'min':[],'max':[]}
for i in range(len(dic['min'])):
x[dic['day'][i]]['min']+=[dic['min'][i]]
x[dic['day'][i]]['max']+=[dic['max'][i]]
print(dict(x))
"""
structure data for future use
{'Friday 5 January': {'max': [0.83, 0.33], 'min': [6.48, 12.42]},
'Thursday 4 January': {'max': [0.38, 0.73, 0.02],
'min': [11.58, 16.95, 23.68]}}
"""
result =[]
for i in x:
result.append(r'{}: {} meters at lowest an {} meters at highest'.format(i,min(x[i]['min']),max(x[i]['max'])))
print(result)
"""output
['Thursday 4 January: 11.58 meters at lowest an 0.73 meters at highest', 'Friday 5 January: 6.48 meters at lowest an 0.83 meters at highest']
""""
Upvotes: 1
Reputation: 3744
first you can use groupby
to group by date and sort
by the 3rd index height. Sample code here:
from itertools import groupby
from operator import itemgetter
data = [['Thursday 4 January', 11.58, 0.38], ['Thursday 4 January', 16.95, 0.73], ['Thursday 4 January', 23.68, 0.02],
['Friday 5 January', 6.48, 0.83], ['Friday 5 January', 12.42, 0.33]]
for k, g in groupby(data, key=itemgetter(0)):
a = sorted(g, key=itemgetter(2))
print('{}: {} meters at lowest and {} meters at highest'.format(k, a[0][2], a[-1][2]))
output:
Thursday 4 January: 0.02 meters at lowest and 0.73 meters at highest
Friday 5 January: 0.33 meters at lowest and 0.83 meters at highest
By the way, make sure your data is ordered by date before using groupby
in date.
Hope that will help you, and comment if you have further questions. : )
Upvotes: 1