ksalerno
ksalerno

Reputation: 191

Finding median of entire pandas Data frame

I'm trying to find the median flow of the entire dataframe. The first part of this is to select only certain items in the dataframe.

There were two problems with this, it included parts of the data frame that aren't in 'states'. Also, the median was not a single value, it was based on row. How would I get the overall median of all the data in the dataframe?

Upvotes: 13

Views: 3547

Answers (2)

MattR
MattR

Reputation: 5146

The DataFrame you pasted is slightly messy due to some spaces. But you're going to want to melt the Dataframe and then use median() on the new melted Dataframe:

df2 = pd.melt(df, id_vars =['U.S.'])
print(df2['value'].median())

Your Dataframe may be slightly different, but the concept is the same. Check the comment that I left about to understand pd.melt(), especially the value_vars and id_vars arguments.

Here is a very detailed way of how I went about cleaning and getting the correct answer:

# reading in on clipboard
df = pd.read_clipboard()

# printing it out to see and also the column names
print(df)
print(df.columns)

# melting the DF and then printing the result
df2 = pd.melt(df, id_vars =['U.S.'])
print(df2)

# Creating a new DF so that no nulls are in there for ease of code readability
# using .copy() to avoid the Pandas warning about working on top of a copy
df3 = df2.dropna().copy()

# there were some funky values in the 'value' column. Just getting rid of those
df3.loc[df3.value.isin(['Columbia', 'of']), 'value'] = 99

# printing out the cleaned version and getting the median
print(df3)
print(df3['value'].median())

Upvotes: 1

user2285236
user2285236

Reputation:

Two options:

1) A pandas option:

df.stack().median()

2) A numpy option:

np.median(df.values)

Upvotes: 18

Related Questions