user19107602
user19107602

Reputation: 29

Groupby year dropping some variables

This is the original data and I need the mean of each year of all the variables.

Original data

But when I am using groupby('year') command, it is dropping all variables except 'lnmcap' and 'epu'.

Post Groupby output image

Why this is happening and what needs to be done?

Upvotes: 2

Views: 59

Answers (3)

blackraven
blackraven

Reputation: 5597

You will need to convert the numeric columns to float types. Use df.info() to check the various data types.

for col in ds.select_dtypes(['object']).columns:
    try:
        ds[col] = ds[col].astype('float')
    except:
        continue

After this, use df.info() to check again. Those columns with objects like '1.604809' will be converted to float 1.604809

Sometimes, the column may contain some "dirty" data that cannot be converted to float. In this case, you could use below code with errors='coerce' means non-numeric data becomes NaN

column_names = list(ds.columns)
column_names.remove('company')
column_names.remove('year')
for col in column_names:
    ds[col] = pd.to_numeric(ds[col], errors='coerce')    #this will convert to numeric, whereas non-numeric becomes NaN

Upvotes: 1

MiH
MiH

Reputation: 352

You might want to convert all numerical columns to float before getting their mean, for example

cols = list(ds.columns)

#remove irrelevant columns
cols.pop(cols.index('company'))
cols.pop(cols.index('year'))

#convert remaining relevant columns to float
for col in cols:
    ds[col] = pd.to_numeric(ds[col], errors='coerce')
    
#after that you can apply the aggregation
ds.groupby('year').mean()

Upvotes: 1

Debi Prasad
Debi Prasad

Reputation: 312

Probably the other columns have object or string type of the data, instead of integer, as a result of which only 'Inmcap' and 'epu' has got the average column.
Use ds.dtypes or simply ds.info() to check the data types of data in the columns
it comes out to be object/string type then use

ds=ds.drop('company',axis=1)
column_names=ds.columns
for i in column_names:
   ds[i]=ds[i].astype(str).astype(float)

This could work

Upvotes: 1

Related Questions