Reputation: 51
I have a csv file. I want to create dictionary from this data.I should not pandas. Data looks like this:
I do this. But size of the numbers is not same. How can I create a dictionary?
filename=" data.dat"
file=open(filename, encoding="latin-1").read().split(' , ')
dictt={}
for row in file:
dictt[row[0]] = {‘values’, row[1]}
I have a file as above. First, I need to create a dict. After that, I will print the daily number of unique measurements in desending order according to date.
Final Expected result:
Upvotes: 1
Views: 1280
Reputation: 1
The below code would work just fine.
from typing import List , Dict
def create_data_dictionary(keys , values) -> List:
data_dictionary = []
for key , value in zip(keys , values) :
data_dictionary.append((key , value))
return data_dictionary
def parser(path : str) -> List :
_keys = []
_values = []
with open(path , "r") as fptr :
fptr.readline()
for line in fptr :
row = line.strip().split(",")
_keys.append(row[0].strip())
st = set([weight.strip() for weight in row[1 : ]])
# print(st)
_values.append(len(st))
# print(row)
return create_data_dictionary(_keys , _values)
if __name__ == "__main__" :
path = "resources/test2.csv"
data_dictionary = parser(path)
data_dictionary.sort(key = lambda x : x[1] , reverse=True)
print(data_dictionary)
Below is the data which I used to parse and created a data dictionary.
Date,weight
2020-06-12 00:00:00+03:00 , 91.5,91.9,91.9,91.9,92.55,92.55,92.1,92.1,93.3,93.3
2020-06-13 00:00:00+03:00 , 91.6,91.6,92.85,92.85,92.3,92.3,92.1,92.1,94.1,94.1
2020-06-14 00:00:00+03:00 , 91.5,91.5,91.65,91.5,91.5,92.9,92.9
2020-06-15 00:00:00+03:00 , 91.85,91.85,91.6,91.6,91.85,92.55,92.4,92.4,93.7,93.7,93.35,93.35
2020-06-16 00:00:00+03:00 , 91.5,91.9,91.9,91.9,92.55,92.55,92.1,92.1,93.3,93.3,98.7,94.7
2020-06-17 00:00:00+03:00 , 91.5,91.9,91.9,91.9,92.55,92.55,92.1,92.1,93.3,93.3,94.0
2020-06-18 00:00:00+03:00 , 91.5,91.9,91.9,91.9,92.55,92.55
2020-06-19 00:00:00+03:00 , 91.5,91.9,91.9,91.9,92.55,92.55,92.1,92.1,93.3,93.3
2020-06-20 00:00:00+03:00
Here is what the output looks like.
[('2020-06-16 00:00:00+03:00', 7), ('2020-06-15 00:00:00+03:00', 6), ('2020-06-17 00:00:00+03:00', 6), ('2020-06-12 00:00:00+03:00', 5), ('2020-06-13 00:00:00+03:00', 5), ('2020-06-19 00:00:00+03:00', 5), ('2020-06-14 00:00:00+03:00', 3), ('2020-06-18 00:00:00+03:00', 3), ('2020-06-20 00:00:00+03:00', 0)]
You can now print the output accordingly or store it in a text/csv file.
Upvotes: 0
Reputation: 1934
Hi
That will do what you want
with open("./test.txt") as myFile:
formattedData = dict()
for line in myFile:
try:
date , numbers = line.split(' , ')
numbers = numbers.replace("\n","")
numbers = numbers.split(',')
formattedData[date] = len(list(set(numbers)))
except:
date = line
formattedData[date] = 0
print(formattedData)
Upvotes: 1
Reputation: 121
Firstly, do not call your variables 'list' as that is a python keyword and it will cause confusion. I can't reproduce this because I don't have the file, but I think changing the line in the for loop to this should work.
newvariablename[row[0]] = row[1]
Upvotes: 0