user3177938
user3177938

Reputation: 445

python, dictionary in a data frame, sorting

I have a python data frame called wiki, with the wikipedia information for some people. Each row is a different person, and the columns are : 'name', 'text' and 'word_count'. The information in 'text' has been put in dictionary form (keys,values), to create the information in the column 'word_count'.

If I want to extract the row related to Barack Obama, then:

row = wiki[wiki['name'] == 'Barack Obama']

Now, I would like the most popular word. When I do:

adf=row[['word_count']]

I get another data frame because I see that:

type(adf)=<class 'pandas.core.frame.DataFrame'>

and if I do

adf.values

I get:

array([[ {u'operations': 1, u'represent': 1, u'office': 2, ..., u'began': 1}], dtype=object)

However, what is very confusing to me is that the size is 1

adf.size=1

Therefore, I do not know how to actually extract the keys and values. Things like adf.values[1] do not work

Ultimately, what I need to do is sort the information in word_count so that the most frequent words appear first. But I would like to understand how to access a the information that is inside a dictionary, inside a data frame... I am lost about the types here. I am not new to programming, but I am relatively new to python.

Any help would be very very much appreciated

Upvotes: 0

Views: 707

Answers (1)

HYRY
HYRY

Reputation: 97291

If the name column is unique, then you can change the column to the index of the DataFrame object:wiki.set_index("name", inplace=True). Then you can get the value by: wiki.at['Barack Obama', 'word_count'].

With your code:

row = wiki[wiki['name'] == 'Barack Obama']
adf = row[['word_count']]

The first line use a bool array to get the data, here is the document: http://pandas.pydata.org/pandas-docs/stable/indexing.html#boolean-indexing

wiki is a DataFrame object, and row is also a DataFrame object with only one row, if the name column is unique.

The second line get a list of columns from the row, here is the document: http://pandas.pydata.org/pandas-docs/stable/indexing.html#basics

You get a DataFrame with only one row and one column.

And here is the document of .at[]: http://pandas.pydata.org/pandas-docs/stable/indexing.html#fast-scalar-value-getting-and-setting

Upvotes: 1

Related Questions