David Cober
David Cober

Reputation: 15

Python: Ordering data based on date

I am trying to order the data below based on the date

Here is the code I have so far:

txt_data = """
        1. Oliver Nash, DIN-23459818, 2021-09-13, Yes, Opium,
        2. Mary Evans, DIN-23454678, 2021-09-09, Yes, Hydrocone,
        3. Michael Ye, DIN-23456678, 2021-09-12, No,
        """
        
txt_lst = []
new_lst = []
newer_lst = []
        
for line in txt_data.splitlines():
      txt_lst.append(line)
              
def extractDigits(lst):
      return [[el] for el in lst]
        
new_lst = extractDigits(txt_lst)
        
for string in new_lst:
    for item in txt_data.split(','):
        newer_lst.append(item.strip())
        
# print(txt_lst[1])
# print(new_lst[1])
print(newer_lst)

print(newer_lst[1][0]) gives D

I am having a hard time removing the line which contains "No" and getting the date. I wanted something like new_lst[1][2] which should giveDIN-23459818 or new_lst[1][3] which would be 2021-09-13

Any help is appreciated.

Upvotes: 1

Views: 81

Answers (5)

Oto Shavadze
Oto Shavadze

Reputation: 42793

One way with pandas:

import pandas as pd
txt_data = """
        1. Oliver Nash, DIN-23459818, 2021-09-13, Yes, Opium,
        2. Mary Evans, DIN-23454678, 2021-09-09, Yes, Hydrocone,
        3. Michael Ye, DIN-23456678, 2021-09-12, No,
        """

t_list = list(map(str.strip, txt_data.split(",\n")))
simple_list = []
for r in t_list:
    if r != '':
        simple_list.append(r.split(","))

df=pd.DataFrame(simple_list,columns=['col1','col2','col3','col4','col5']).sort_values(by='col3', ascending=False).values.tolist()
print(df)

Upvotes: 0

Michael
Michael

Reputation: 5335

Use the list of dicts:

txt_data = """ 
  1. Oliver Nash, DIN-23459818, 2021-09-13, Yes, Opium, 
  2. Mary Evans, DIN-23454678, 2021-09-09, Yes, Hydrocone, 
  3. Michael Ye, DIN-23456678, 2021-09-12, No, 
  """ 
            

lines = txt_data.splitlines()                                                                                                                                                        

lines                                                                                                                                                                                

['',
'        1. Oliver Nash, DIN-23459818, 2021-09-13, Yes, Opium,',
'        2. Mary Evans, DIN-23454678, 2021-09-09, Yes, Hydrocone,',
'        3. Michael Ye, DIN-23456678, 2021-09-12, No,',
'        ']

list1 = []                                                                                                                                                                     

for l in lines: 
    spl = [x.strip() for x in l.split(',')]
    if len(spl)<5: continue 
    d = {} 
    d['name'] = spl[0] 
    d['number'] = spl[1] 
    d['date'] = spl[2] 
    d['yn'] = spl[3] 
    d['stuff'] = spl[4] 
    list1.append(d)

list2 = sorted(list1, key=lambda x: x['date'])                                                                                                                                

list2                                                                                                                                                                               

[{'name': '2. Mary Evans',
'number': 'DIN-23454678',
'date': '2021-09-09',
'yn': 'Yes',
'stuff': 'Hydrocone'},
{'name': '3. Michael Ye',
'number': 'DIN-23456678',
'date': '2021-09-12',
'yn': 'No',
'stuff': ''},
{'name': '1. Oliver Nash',
'number': 'DIN-23459818',
'date': '2021-09-13',
'yn': 'Yes',
'stuff': 'Opium'}]

Upvotes: 0

RJ Adriaansen
RJ Adriaansen

Reputation: 9639

You can use pandas for this, which is especially useful if you want to do some additional data processing:

import pandas as pd
import io

txt_data = """
        1. Oliver Nash, DIN-23459818, 2021-09-13, Yes, Opium,
        2. Mary Evans, DIN-23454678, 2021-09-09, Yes, Hydrocone,
        3. Michael Ye, DIN-23456678, 2021-09-12, No,
        """

df = pd.read_csv(io.StringIO(txt_data), names=['name', 'DIN', 'date', 'y/n', 'substance', 'empty'], skipinitialspace=True, parse_dates=['date']).drop('empty', axis=1)
df['name'] = df['name'].str.split('.').str[1].str.strip() # remove number from name column
df = df.sort_values('date').reset_index(drop=True) #sort by date
df.to_csv('output.csv') # save as csv

Output:

name DIN date y/n substance
0 Mary Evans DIN-23454678 2021-09-09 00:00:00 Yes Hydrocone
1 Michael Ye DIN-23456678 2021-09-12 00:00:00 No nan
2 Oliver Nash DIN-23459818 2021-09-13 00:00:00 Yes Opium

Upvotes: 0

rkechols
rkechols

Reputation: 577

There are a number of things wrong with your code. Here's what I see is currently happening in your code.

  1. Basic variables are declared

  2. txt_data is split into this and saved as txt_lst:

['', '        1. Oliver Nash, DIN-23459818, 2021-09-13, Yes, Opium,', '        2. Mary Evans, DIN-23454678, 2021-09-09, Yes, Hydrocone,', '        3. Michael Ye, DIN-23456678, 2021-09-12, No,', '        ']
  1. A function extractDigits is defined which wraps each element of a list into a list of length 1.

  2. extractDigits is called to create new_lst, which is then this:

[[''], ['        1. Oliver Nash, DIN-23459818, 2021-09-13, Yes, Opium,'], ['        2. Mary Evans, DIN-23454678, 2021-09-09, Yes, Hydrocone,'], ['        3. Michael Ye, DIN-23456678, 2021-09-12, No,'], ['        ']]
  1. Each list of length 1 is pulled from new_lst and temporarily named string, then never used again.

  2. For each iteration in step 5, your original txt_data is split by commas into this, then iterated over:

['\n        1. Oliver Nash', ' DIN-23459818', ' 2021-09-13', ' Yes', ' Opium', '\n        2. Mary Evans', ' DIN-23454678', ' 2021-09-09', ' Yes', ' Hydrocone', '\n        3. Michael Ye', ' DIN-23456678', ' 2021-09-12', ' No', '\n        ']
  1. For each iteration in step 6, you strip the given string and put them all into one big list, repeated for every iteration of step 5, which makes newer_lst be this:
['1. Oliver Nash', 'DIN-23459818', '2021-09-13', 'Yes', 'Opium', '2. Mary Evans', 'DIN-23454678', '2021-09-09', 'Yes', 'Hydrocone', '3. Michael Ye', 'DIN-23456678', '2021-09-12', 'No', '', '1. Oliver Nash', 'DIN-23459818', '2021-09-13', 'Yes', 'Opium', '2. Mary Evans', 'DIN-23454678', '2021-09-09', 'Yes', 'Hydrocone', '3. Michael Ye', 'DIN-23456678', '2021-09-12', 'No', '', '1. Oliver Nash', 'DIN-23459818', '2021-09-13', 'Yes', 'Opium', '2. Mary Evans', 'DIN-23454678', '2021-09-09', 'Yes', 'Hydrocone', '3. Michael Ye', 'DIN-23456678', '2021-09-12', 'No', '', '1. Oliver Nash', 'DIN-23459818', '2021-09-13', 'Yes', 'Opium', '2. Mary Evans', 'DIN-23454678', '2021-09-09', 'Yes', 'Hydrocone', '3. Michael Ye', 'DIN-23456678', '2021-09-12', 'No', '', '1. Oliver Nash', 'DIN-23459818', '2021-09-13', 'Yes', 'Opium', '2. Mary Evans', 'DIN-23454678', '2021-09-09', 'Yes', 'Hydrocone', '3. Michael Ye', 'DIN-23456678', '2021-09-12', 'No', '']
  1. print(newer_lst[1][0]) gives you D because newer_lst[1] is 'DIN-23459818', and then index 0` of that is the first letter.

Do you see how that isn't doing what you would like? I believe this is what you meant for it to do:

txt_data = """
        1. Oliver Nash, DIN-23459818, 2021-09-13, Yes, Opium,
        2. Mary Evans, DIN-23454678, 2021-09-09, Yes, Hydrocone,
        3. Michael Ye, DIN-23456678, 2021-09-12, No,
        """

txt_lst = txt_data.splitlines()  # split by line
new_lst = [line.split(",") for line in txt_lst]  # split each line by comma

print(new_lst[1][1])  # DIN-23459818
print(new_lst[3][2])  # 2021-09-12

Upvotes: 0

S.B
S.B

Reputation: 16526

try:

from datetime import datetime

txt_data = """
        1. Oliver Nash, DIN-23459818, 2021-09-13, Yes, Opium,
        2. Mary Evans, DIN-23454678, 2021-09-09, Yes, Hydrocone,
        3. Michael Ye, DIN-23456678, 2021-09-12, No,
        """

date_format = r'%Y-%m-%d'


lst = [l for line in txt_data.splitlines() if (l := line.strip())]

def key_function(x):
    date_string = x.split(',')[2].lstrip()
    date = datetime.strptime(date_string, date_format)
    return date

lst.sort(key=key_function)

for i in lst:
    print(i)

output :

2. Mary Evans, DIN-23454678, 2021-09-09, Yes, Hydrocone,
3. Michael Ye, DIN-23456678, 2021-09-12, No,
1. Oliver Nash, DIN-23459818, 2021-09-13, Yes, Opium,

explanation :

First I splited the text into separate lines and I only accepted non-empty lines after stripping them. Then I sorted the list with the help of key_function which gets the date part of each line and creates a datetime object with strptime. Datetime objects are comparable so you will get the desired output.

Upvotes: 2

Related Questions