Reputation: 445
I have a python data frame called wiki, with the wikipedia information for some people. Each row is a different person, and the columns are : 'name', 'text' and 'word_count'. The information in 'text' has been put in dictionary form (keys,values), to create the information in the column 'word_count'.
If I want to extract the row related to Barack Obama, then:
row = wiki[wiki['name'] == 'Barack Obama']
Now, I would like the most popular word. When I do:
adf=row[['word_count']]
I get another data frame because I see that:
type(adf)=<class 'pandas.core.frame.DataFrame'>
and if I do
adf.values
I get:
array([[ {u'operations': 1, u'represent': 1, u'office': 2, ..., u'began': 1}], dtype=object)
However, what is very confusing to me is that the size is 1
adf.size=1
Therefore, I do not know how to actually extract the keys and values. Things like adf.values[1]
do not work
Ultimately, what I need to do is sort the information in word_count so that the most frequent words appear first. But I would like to understand how to access a the information that is inside a dictionary, inside a data frame... I am lost about the types here. I am not new to programming, but I am relatively new to python.
Any help would be very very much appreciated
Upvotes: 0
Views: 707
Reputation: 97291
If the name column is unique, then you can change the column to the index of the DataFrame
object:wiki.set_index("name", inplace=True)
. Then you can get the value by: wiki.at['Barack Obama', 'word_count']
.
With your code:
row = wiki[wiki['name'] == 'Barack Obama']
adf = row[['word_count']]
The first line use a bool array to get the data, here is the document: http://pandas.pydata.org/pandas-docs/stable/indexing.html#boolean-indexing
wiki
is a DataFrame
object, and row
is also a DataFrame
object with only one row, if the name column is unique.
The second line get a list of columns from the row
, here is the document: http://pandas.pydata.org/pandas-docs/stable/indexing.html#basics
You get a DataFrame
with only one row and one column.
And here is the document of .at[]
: http://pandas.pydata.org/pandas-docs/stable/indexing.html#fast-scalar-value-getting-and-setting
Upvotes: 1