Choosing data structure in Python

Question

I want to manipulate data of this form:

{red -> 1,5,6,7,5,11,...}
{green -> 2,3,4,10,11,12,...}
{blue -> 2,3,5,6,7,8,9,10,...}

where colors are keys, and numbers are, let's say, some locations (non-key integer values).

I'll have a lot of colors, and a lot of associated numbers.

I want to perform operations like total number of colors, top 5 colors with most numbers in it, etc.

What data structures in Python can you suggest to use (which stores key value and associated non key entries)?

I know this is a broad question. I'm trying to solve this problem, if that helps.

PS. I'm trying to follow online course. And that is not a hw. Even if that was a hw, my question is not asking for a solution, i guess.

EDIT

that data collection contains a lot of small txt files with some text in it. In data structure, eventually I want to save unique words from all that txt files along with pointers to documentid's where those words appear.

Ex:

1.txt
"The weather today is good"
2.txt
"It is going to rain today"

data structure should be (numbers are docid's)
{
The->1
weather->1
today->1,2
is->1,2
good->1
it->2
going->2
to->2
rain->2

Batman · Accepted Answer

What you want is almost certainly a dictionary of lists.

data = {"red": [1, 5, 6, 7, 5, 11],
        "green": [2, 3, 4, 10, 11, 12],
        "blue": [2, 3, 5, 6, 7, 8, 9, 10],
        }

To get the total number of colours:

number = len(data)

To sort the dictionary by the length of the values:

sorted_colours = sorted(data, key=lambda x: len(data[x]), reverse=True)

But you should probably check out defaultdict, OrderedDict, and counter from the collections module.

Choosing data structure in Python

Answers (1)

Related Questions