Reputation: 955
I want to manipulate data of this form:
{red -> 1,5,6,7,5,11,...}
{green -> 2,3,4,10,11,12,...}
{blue -> 2,3,5,6,7,8,9,10,...}
where colors are keys, and numbers are, let's say, some locations (non-key integer values).
I'll have a lot of colors, and a lot of associated numbers.
I want to perform operations like total number of colors, top 5 colors with most numbers in it, etc.
What data structures
in Python
can you suggest to use (which stores key value and associated non key entries)?
I know this is a broad question. I'm trying to solve this problem, if that helps.
PS. I'm trying to follow online course. And that is not a hw. Even if that was a hw, my question is not asking for a solution, i guess.
EDIT
that data collection contains a lot of small txt files with some text in it. In data structure, eventually I want to save unique words from all that txt files along with pointers to documentid's where those words appear.
Ex:
1.txt
"The weather today is good"
2.txt
"It is going to rain today"
data structure should be (numbers are docid's)
{
The->1
weather->1
today->1,2
is->1,2
good->1
it->2
going->2
to->2
rain->2
Upvotes: 1
Views: 255
Reputation: 8947
What you want is almost certainly a dictionary of lists.
data = {"red": [1, 5, 6, 7, 5, 11],
"green": [2, 3, 4, 10, 11, 12],
"blue": [2, 3, 5, 6, 7, 8, 9, 10],
}
To get the total number of colours:
number = len(data)
To sort the dictionary by the length of the values:
sorted_colours = sorted(data, key=lambda x: len(data[x]), reverse=True)
But you should probably check out defaultdict
, OrderedDict
, and counter
from the collections module.
Upvotes: 3