Reputation: 191
I'm trying to find the median flow of the entire dataframe. The first part of this is to select only certain items in the dataframe.
There were two problems with this, it included parts of the data frame that aren't in 'states'. Also, the median was not a single value, it was based on row. How would I get the overall median of all the data in the dataframe?
Upvotes: 13
Views: 3547
Reputation: 5146
The DataFrame you pasted is slightly messy due to some spaces. But you're going to want to melt
the Dataframe and then use median()
on the new melted Dataframe:
df2 = pd.melt(df, id_vars =['U.S.'])
print(df2['value'].median())
Your Dataframe may be slightly different, but the concept is the same. Check the comment that I left about to understand pd.melt()
, especially the value_vars
and id_vars
arguments.
Here is a very detailed way of how I went about cleaning and getting the correct answer:
# reading in on clipboard
df = pd.read_clipboard()
# printing it out to see and also the column names
print(df)
print(df.columns)
# melting the DF and then printing the result
df2 = pd.melt(df, id_vars =['U.S.'])
print(df2)
# Creating a new DF so that no nulls are in there for ease of code readability
# using .copy() to avoid the Pandas warning about working on top of a copy
df3 = df2.dropna().copy()
# there were some funky values in the 'value' column. Just getting rid of those
df3.loc[df3.value.isin(['Columbia', 'of']), 'value'] = 99
# printing out the cleaned version and getting the median
print(df3)
print(df3['value'].median())
Upvotes: 1
Reputation:
Two options:
1) A pandas option:
df.stack().median()
2) A numpy option:
np.median(df.values)
Upvotes: 18