Reputation: 15
I am trying to order the data below based on the date
Here is the code I have so far:
txt_data = """
1. Oliver Nash, DIN-23459818, 2021-09-13, Yes, Opium,
2. Mary Evans, DIN-23454678, 2021-09-09, Yes, Hydrocone,
3. Michael Ye, DIN-23456678, 2021-09-12, No,
"""
txt_lst = []
new_lst = []
newer_lst = []
for line in txt_data.splitlines():
txt_lst.append(line)
def extractDigits(lst):
return [[el] for el in lst]
new_lst = extractDigits(txt_lst)
for string in new_lst:
for item in txt_data.split(','):
newer_lst.append(item.strip())
# print(txt_lst[1])
# print(new_lst[1])
print(newer_lst)
print(newer_lst[1][0])
gives D
I am having a hard time removing the line which contains "No" and getting the date. I wanted something like new_lst[1][2]
which should giveDIN-23459818
or new_lst[1][3]
which would be 2021-09-13
Any help is appreciated.
Upvotes: 1
Views: 81
Reputation: 42793
One way with pandas:
import pandas as pd
txt_data = """
1. Oliver Nash, DIN-23459818, 2021-09-13, Yes, Opium,
2. Mary Evans, DIN-23454678, 2021-09-09, Yes, Hydrocone,
3. Michael Ye, DIN-23456678, 2021-09-12, No,
"""
t_list = list(map(str.strip, txt_data.split(",\n")))
simple_list = []
for r in t_list:
if r != '':
simple_list.append(r.split(","))
df=pd.DataFrame(simple_list,columns=['col1','col2','col3','col4','col5']).sort_values(by='col3', ascending=False).values.tolist()
print(df)
Upvotes: 0
Reputation: 5335
Use the list of dicts:
txt_data = """
1. Oliver Nash, DIN-23459818, 2021-09-13, Yes, Opium,
2. Mary Evans, DIN-23454678, 2021-09-09, Yes, Hydrocone,
3. Michael Ye, DIN-23456678, 2021-09-12, No,
"""
lines = txt_data.splitlines()
lines
['',
' 1. Oliver Nash, DIN-23459818, 2021-09-13, Yes, Opium,',
' 2. Mary Evans, DIN-23454678, 2021-09-09, Yes, Hydrocone,',
' 3. Michael Ye, DIN-23456678, 2021-09-12, No,',
' ']
list1 = []
for l in lines:
spl = [x.strip() for x in l.split(',')]
if len(spl)<5: continue
d = {}
d['name'] = spl[0]
d['number'] = spl[1]
d['date'] = spl[2]
d['yn'] = spl[3]
d['stuff'] = spl[4]
list1.append(d)
list2 = sorted(list1, key=lambda x: x['date'])
list2
[{'name': '2. Mary Evans',
'number': 'DIN-23454678',
'date': '2021-09-09',
'yn': 'Yes',
'stuff': 'Hydrocone'},
{'name': '3. Michael Ye',
'number': 'DIN-23456678',
'date': '2021-09-12',
'yn': 'No',
'stuff': ''},
{'name': '1. Oliver Nash',
'number': 'DIN-23459818',
'date': '2021-09-13',
'yn': 'Yes',
'stuff': 'Opium'}]
Upvotes: 0
Reputation: 9639
You can use pandas for this, which is especially useful if you want to do some additional data processing:
import pandas as pd
import io
txt_data = """
1. Oliver Nash, DIN-23459818, 2021-09-13, Yes, Opium,
2. Mary Evans, DIN-23454678, 2021-09-09, Yes, Hydrocone,
3. Michael Ye, DIN-23456678, 2021-09-12, No,
"""
df = pd.read_csv(io.StringIO(txt_data), names=['name', 'DIN', 'date', 'y/n', 'substance', 'empty'], skipinitialspace=True, parse_dates=['date']).drop('empty', axis=1)
df['name'] = df['name'].str.split('.').str[1].str.strip() # remove number from name column
df = df.sort_values('date').reset_index(drop=True) #sort by date
df.to_csv('output.csv') # save as csv
Output:
name | DIN | date | y/n | substance | |
---|---|---|---|---|---|
0 | Mary Evans | DIN-23454678 | 2021-09-09 00:00:00 | Yes | Hydrocone |
1 | Michael Ye | DIN-23456678 | 2021-09-12 00:00:00 | No | nan |
2 | Oliver Nash | DIN-23459818 | 2021-09-13 00:00:00 | Yes | Opium |
Upvotes: 0
Reputation: 577
There are a number of things wrong with your code. Here's what I see is currently happening in your code.
Basic variables are declared
txt_data
is split into this and saved as txt_lst
:
['', ' 1. Oliver Nash, DIN-23459818, 2021-09-13, Yes, Opium,', ' 2. Mary Evans, DIN-23454678, 2021-09-09, Yes, Hydrocone,', ' 3. Michael Ye, DIN-23456678, 2021-09-12, No,', ' ']
A function extractDigits
is defined which wraps each element of a list into a list of length 1.
extractDigits
is called to create new_lst
, which is then this:
[[''], [' 1. Oliver Nash, DIN-23459818, 2021-09-13, Yes, Opium,'], [' 2. Mary Evans, DIN-23454678, 2021-09-09, Yes, Hydrocone,'], [' 3. Michael Ye, DIN-23456678, 2021-09-12, No,'], [' ']]
Each list of length 1 is pulled from new_lst
and temporarily named string
, then never used again.
For each iteration in step 5, your original txt_data
is split by commas into this, then iterated over:
['\n 1. Oliver Nash', ' DIN-23459818', ' 2021-09-13', ' Yes', ' Opium', '\n 2. Mary Evans', ' DIN-23454678', ' 2021-09-09', ' Yes', ' Hydrocone', '\n 3. Michael Ye', ' DIN-23456678', ' 2021-09-12', ' No', '\n ']
newer_lst
be this:['1. Oliver Nash', 'DIN-23459818', '2021-09-13', 'Yes', 'Opium', '2. Mary Evans', 'DIN-23454678', '2021-09-09', 'Yes', 'Hydrocone', '3. Michael Ye', 'DIN-23456678', '2021-09-12', 'No', '', '1. Oliver Nash', 'DIN-23459818', '2021-09-13', 'Yes', 'Opium', '2. Mary Evans', 'DIN-23454678', '2021-09-09', 'Yes', 'Hydrocone', '3. Michael Ye', 'DIN-23456678', '2021-09-12', 'No', '', '1. Oliver Nash', 'DIN-23459818', '2021-09-13', 'Yes', 'Opium', '2. Mary Evans', 'DIN-23454678', '2021-09-09', 'Yes', 'Hydrocone', '3. Michael Ye', 'DIN-23456678', '2021-09-12', 'No', '', '1. Oliver Nash', 'DIN-23459818', '2021-09-13', 'Yes', 'Opium', '2. Mary Evans', 'DIN-23454678', '2021-09-09', 'Yes', 'Hydrocone', '3. Michael Ye', 'DIN-23456678', '2021-09-12', 'No', '', '1. Oliver Nash', 'DIN-23459818', '2021-09-13', 'Yes', 'Opium', '2. Mary Evans', 'DIN-23454678', '2021-09-09', 'Yes', 'Hydrocone', '3. Michael Ye', 'DIN-23456678', '2021-09-12', 'No', '']
print(newer_lst[1][0])
gives you D
because newer_lst[1]
is 'DIN-23459818', and then index
0` of that is the first letter.Do you see how that isn't doing what you would like? I believe this is what you meant for it to do:
txt_data = """
1. Oliver Nash, DIN-23459818, 2021-09-13, Yes, Opium,
2. Mary Evans, DIN-23454678, 2021-09-09, Yes, Hydrocone,
3. Michael Ye, DIN-23456678, 2021-09-12, No,
"""
txt_lst = txt_data.splitlines() # split by line
new_lst = [line.split(",") for line in txt_lst] # split each line by comma
print(new_lst[1][1]) # DIN-23459818
print(new_lst[3][2]) # 2021-09-12
Upvotes: 0
Reputation: 16526
try:
from datetime import datetime
txt_data = """
1. Oliver Nash, DIN-23459818, 2021-09-13, Yes, Opium,
2. Mary Evans, DIN-23454678, 2021-09-09, Yes, Hydrocone,
3. Michael Ye, DIN-23456678, 2021-09-12, No,
"""
date_format = r'%Y-%m-%d'
lst = [l for line in txt_data.splitlines() if (l := line.strip())]
def key_function(x):
date_string = x.split(',')[2].lstrip()
date = datetime.strptime(date_string, date_format)
return date
lst.sort(key=key_function)
for i in lst:
print(i)
output :
2. Mary Evans, DIN-23454678, 2021-09-09, Yes, Hydrocone,
3. Michael Ye, DIN-23456678, 2021-09-12, No,
1. Oliver Nash, DIN-23459818, 2021-09-13, Yes, Opium,
explanation :
First I splited the text into separate lines and I only accepted non-empty lines after stripping them. Then I sorted the list with the help of key_function
which gets the date part of each line and creates a datetime object with strptime
. Datetime objects are comparable so you will get the desired output.
Upvotes: 2