Reputation: 1
Maybe someone can help me? I have to write a function that returns dictionary from a table where keys are information like sex, age, location, corona test answer and date. Values are how often they occur in the column of the table.
[['N', '25-29', 'Eesti', 'Harju maakond', 'N', '06.03.2020'],
['N', '35-39', 'Eesti', 'Harju maakond', 'N', '06.03.2020'],
['N', '40-44', 'Eesti', 'Saare maakond', 'N', '06.03.2020'],
['N', '35-39', 'Eesti', 'Tartu maakond', 'N', '06.03.2020'],
['M', '40-44', 'Eesti', 'Harju maakond', 'N', '06.03.2020']]
it's my code:
def erinevused(faili_nimi, i):
with open(faili_nimi, encoding = "UTF-8") as fail:
read = fail.read().split(";")
sõnastik = {i: read.count(i) for i in read}
return sõnastik
it gives all the frequencies
{'N': 6, '25-29': 1, 'Eesti': 5, 'Harju maakond': 3, '06.03.2020\nN': 3, '35-39': 2, '40-44': 2, 'Saare maakond': 1, 'Tartu maakond': 1, '06.03.2020\nM': 1, '06.03.2020': 1}
but I need only i values like here (i starts from 1 not 0):
erinevused('andmed.txt', 2)
{'25-29': 1, '35-39': 2, '40-44': 2}
so, how to get frequency of an element in a column?
Upvotes: 0
Views: 200
Reputation: 1476
Probably you want this.
Also, I have changed your given data table a little bit since location name like Eesti, Harju maakond
is a location / place. Also, you provided 5 headers, but 6 columns in data which is why i have to do that. Probably, you have to change that too in your previous code which is generating that table since it's name of a location in Estonia i guess.
Always use Pandas for handling of columns of data.
import pandas as pd # Pandas dataframe (install pandas using pip install pandas)
headers = ['sex', 'age', 'location', 'coronatestanswer', 'date']
datatable = [['N', '25-29', 'Eesti, Harju maakond', 'N', '06.03.2020'],
['N', '35-39', 'Eesti, Harju maakond', 'N', '06.03.2020'],
['N', '40-44', 'Eesti, Saare maakond', 'N', '06.03.2020'],
['N', '35-39', 'Eesti, Tartu maakond', 'N', '06.03.2020'],
['M', '40-44', 'Eesti, Harju maakond', 'N', '06.03.2020']]
df = pd.DataFrame(datatable, columns=headers) # Data frame created from given list of lists
print(df) # Take a look a the organized dataframe in pandas
print(df['age'].value_counts()) # Count frequency of elements in a column
Output for print(df):
sex age location coronatestanswer date
0 N 25-29 Eesti, Harju maakond N 06.03.2020
1 N 35-39 Eesti, Harju maakond N 06.03.2020
2 N 40-44 Eesti, Saare maakond N 06.03.2020
3 N 35-39 Eesti, Tartu maakond N 06.03.2020
4 M 40-44 Eesti, Harju maakond N 06.03.2020
Output for frequency count:
35-39 2
40-44 2
25-29 1
Name: age, dtype: int64
Without using Pandas, it's even much shorter. The problem is if you want it for each column, then code is just redundant and unnecessary repetition. That's why Pandas is awesome. Python is all about making task easier and efficient. :)
But, anyways. Here is the code without Pandas
from collections import Counter # Now, Don't shout at me. This is standard library. No need to install anything.
age_list = [datatable[i][1] for i in range(1,len(datatable))] # This is called list comprehension.
print (Counter(age_list)).
Output:
Counter({'35-39': 2, '40-44': 2, '25-29': 1})
The Counter is a dictionary object. If you assign Counter(age_list)
to another variable. You can access any age group's frequency at will at any time. Somewhat like this.
age_list = Counter([datatable[i][1] for i in range(1,len(datatable))])
print(age_list['40-44'])
Output is 2
ofcourse.
Upvotes: 1