Reputation: 706

Data structure for working with rows and columns

I have data grabbed to Python that is in table form:

Name  Sport   Score  
John  Golf    100
Jill  Rugby   55
John  Hockey  100
Bob   Golf    45

How can I format this table in Python that would make it easy to sort or group items. For example, if I wanted to see all the names of people that played Golf or all of the people that scored 100 on any sport. Or all of the data for just John.

Upvotes: 0

Answers (5)

Ahasanul Haque

Reputation: 11164

What about this one?

yourDS={"name":["John","Jill","John","Bob"],
    "sport":["Golf","Rugby","Hockey","Golf"],
    "score":[100,55,100,45]
}

This should hold the relation of each entry as list are ordered.

To avoid the effect of duplicate element in a list, first make a new set from the list.

For your expected query, you can do something like that.

for index,value in enumerate(yourDS["score"]):
    if value=="x":
        print yourDS["name"][index]

It's better to use a list to store the result and make it a set, to avoid some cases for example, if a man has score of x in two different games.

Upvotes: 1

emre.

Reputation: 1296

map and filter with namedtuples and lambdas can be used for this task.

from collections import namedtuple

# Create a named tuple to store the rows
Row = namedtuple('Row', ('name', 'sport', 'score'))

data = '''Name  Sport   Score  
          John  Golf    100
          Jill  Rugby   55
          John  Hockey  100
          Bob   Golf    45'''

# Read the data, skip the first line
lines = data.splitlines()[1:]
rows = []
for line in lines:
    name, sport, score = line.strip().split()
    rows.append(Row(name, sport, int(score)))

# People that played Golf
golf_filter = lambda row: row.sport == 'Golf'
golf_players = filter(golf_filter, rows)

# People that scored 100 on any sport
score_filter = lambda row: row.score == 100
scorers = filter(score_filter, rows)

# People named John
john_filter = lambda row: row.name == 'John'
john_data = filter(john_filter, rows)

# If you want a specific column than you can map the data
# Names of golf players
get_name = lambda row: row.name
golf_players_names = map(get_name, golf_players)

Results:

>>> golf_players
[Row(name='John', sport='Golf', score=100),
 Row(name='Bob', sport='Golf', score=45)]

>>> john_data
[Row(name='John', sport='Golf', score=100),
 Row(name='John', sport='Hockey', score=100)]

>>> scorers
[Row(name='John', sport='Golf', score=100),
 Row(name='John', sport='Hockey', score=100)]

>>> golf_players_names
['John', 'Bob']

Upvotes: 1

JesseTG

Reputation: 2123

If you want to retrieve information based on your data, I'd go with SQL. It's well-suited to answering questions like these:

...to see all the names of people that played Golf...

...all of the people that scored 100 on any sport...

...all of the data for just John.

The most popular database language these days is SQL, and as it happens Python actually has built-in support for it through the sqlite3 module.

SQL, while not a monumental task to learn, is beyond the scope of this answer. To learn that, I'd recommend checking out Codecademy, Code School, or SQLZOO (they're all interactive).

Or, if you just want to read it in and write it out without caring about what it actually means, consider using the csv module, which is also built-in.

Upvotes: -1

DeepSpace

Reputation: 81684

pandas' DataFrame will be the way to go:

import pandas as pd

df = pd.DataFrame({'Name': ['John', 'Jill', 'John', 'Bob'], 
                   'Sport' : ['Golf', 'Rugby', 'Hockey', 'Golf'],
                   'Score': [100, 50, 100, 45]}) 

# the names of people that played Golf

df[df['Sport'] == 'Golf']['Name'].unique()
>> ['John' 'Bob']

# all of the people that scored 100 on any sport.

df[df['Score'] == 100]['Name'].unique()
>> ['John']

# all of the data for just John.
df[df['Name'] == 'John']
>>    Name  Score   Sport
   0  John    100    Golf
   2  John    100  Hockey

Upvotes: 1

Boaz Galil

Reputation: 72

You can create list of lists. each row will be a list inside a list.

lst1=[['John','Golf',100],['Jill','Rugby',55],['John','Hockey',100],['Bob','Golf',45]]
lst100=[]
for lst in lst1:
    if lst[2]==100:
        lst100.append(lst)
print lst100

Upvotes: -1

Data structure for working with rows and columns

Answers (5)

Related Questions