LCh
LCh

Reputation: 9

DataFrame from Dictionary with variable length keys

So for this assignment I managed to create a dictionary, where the keys are State names (eg: Alabama, Alaska, Arizona), and the values are lists of regions for each state. The problem is that the lists of regions are of different lengths - so each state can have a different number of regions associated.

Example : 'Alabama': ['Auburn',
  'Florence',
  'Jacksonville',
  'Livingston',
  'Montevallo',
  'Troy',
  'Tuscaloosa',
  'Tuskegee'],
 'Alaska': ['Fairbanks'],
 'Arizona': ['Flagstaff', 'Tempe', 'Tucson'],

How can I unload this into a pandas Dataframe? What I want is basically 2 columns - "State", "Region". Something similar to what you would get if you would do a "GroupBy" on state for the regions.

Upvotes: 1

Views: 80

Answers (2)

Manthan Trivedi
Manthan Trivedi

Reputation: 99

You can also do this by dividing the dictionary into lists. Although that will be a little longer approach. For Example:

Example = {'Alabama': ['Auburn','Florence','Jacksonville','Livingston','Montevallo','Troy','Tuscaloosa','Tuskegee'],
'Alaska': ['Fairbanks'],
 'Arizona': ['Flagstaff', 'Tempe', 'Tucson']}

new_list_of_keys = []
new_list_of_values = []

keys = list(Example.keys())
values = list(Example.values())

for i in range(len(keys)):
  for j in range(len(values[i])):
    new_list_of_values.append(values[i][j])
    new_list_of_keys.append(keys[i])

df = pd.DataFrame(zip(new_list_of_keys, new_list_of_values), columns = ['State', 'Region'])

This will give output as:

   State        Region
0   Alabama        Auburn
1   Alabama      Florence
2   Alabama  Jacksonville
3   Alabama    Livingston
4   Alabama    Montevallo
5   Alabama          Troy
6   Alabama    Tuscaloosa
7   Alabama      Tuskegee
8    Alaska     Fairbanks
9   Arizona     Flagstaff
10  Arizona         Tempe
11  Arizona        Tucson

Upvotes: 0

Quang Hoang
Quang Hoang

Reputation: 150735

If you work on pandas 0.25+, you can use explode:

pd.Series(states).explode()

Output:

Alabama          Auburn
Alabama        Florence
Alabama    Jacksonville
Alabama      Livingston
Alabama      Montevallo
Alabama            Troy
Alabama      Tuscaloosa
Alabama        Tuskegee
Alaska        Fairbanks
Arizona       Flagstaff
Arizona           Tempe
Arizona          Tucson
dtype: object

You can also use concat which works for most pandas version:

pd.concat(pd.DataFrame({'state':k, 'Region':v}) for k,v in states.items())

Output:

     state        Region
0  Alabama        Auburn
1  Alabama      Florence
2  Alabama  Jacksonville
3  Alabama    Livingston
4  Alabama    Montevallo
5  Alabama          Troy
6  Alabama    Tuscaloosa
7  Alabama      Tuskegee
0   Alaska     Fairbanks
0  Arizona     Flagstaff
1  Arizona         Tempe
2  Arizona        Tucson

Upvotes: 2

Related Questions