Reputation: 43
I have a tab-delimited file with movie genre and year in 2 columns:
Comedy 2013
Comedy 2014
Drama 2012
Mystery 2011
Comedy 2013
Comedy 2013
Comedy 2014
Comedy 2013
News 2012
Sport 2012
Sci-Fi 2013
Comedy 2014
Family 2013
Comedy 2013
Drama 2013
Biography 2013
I want to group the genres together by year and print out in the following format (does not have to be in alphabetical order):
Year 2011 2012 2013 2014
Biography 0 0 1 0
Comedy 0 0 5 3
Drama 0 1 1 0
Family 0 0 1 0
Mystery 1 0 0 0
News 0 1 0 0
Sci-Fi 0 0 1 0
Sport 0 1 0 0
How should I approach it? At the moment I'm creating my output through MS Excel, but I would like to do it through Python.
Upvotes: 0
Views: 184
Reputation: 17907
If you don't like to use pandas
, you can do it as follows:
from collections import Counter
# load file
with open('tab.txt') as f:
lines = f.read().split('\n')
# replace separating whitespace with exactly one space
lines = [' '.join(l.split()) for l in lines]
# find all years and genres
genres = sorted(set(l.split()[0] for l in lines))
years = sorted(set(l.split()[1] for l in lines))
# count genre-year combinations
C = Counter(lines)
# print table
print "Year".ljust(10),
for y in years:
print y.rjust(6),
print
for g in genres:
print g.ljust(10),
for y in years:
print `C[g + ' ' + y]`.rjust(6),
print
The most interesting function is probably Counter
, which counts the number of occurrences of each element. To make sure that the length of the separating whitespace does not influence the counting, I replace it with a single space beforehand.
Upvotes: 1
Reputation: 60180
The easiest way do to this is using the pandas
library, which provides lots of way of interacting with tables of data:
df = pd.read_clipboard(names=['genre', 'year'])
df.pivot_table(index='genre', columns='year', aggfunc=len, fill_value=0)
Output:
year 2011 2012 2013 2014
genre
Biography 0 0 1 0
Comedy 0 0 5 3
Drama 0 1 1 0
Family 0 0 1 0
Mystery 1 0 0 0
News 0 1 0 0
Sci-Fi 0 0 1 0
Sport 0 1 0 0
If you're only just starting with Python, you might find trying to learn pandas
is a bit too much on top of learning the language, but once you have some Python knowledge, pandas
provides very intuitive ways to interact with data.
Upvotes: 1