Holmesjr
Holmesjr

Reputation: 21

Very specific task with pandas

I have this dataset that represents information on Mars craters. I am only interested in the latitude, longitude, and the number of visible layers columns. I am trying to get the latitude into groups of 10 degrees: -90 to -80, -80 to -70, etc. while taking the layer column values (0,1,2,3,4,5) and turning them into columns themselves to get a table of the value_counts of each layer column value for each 10 degree group.

What I want

What I have

Pulling my hair out on this I seemingly tried everything I could understand.

Upvotes: 2

Views: 218

Answers (2)

B. M.
B. M.

Reputation: 18628

use pd.cut to make groups, and pivot_table to count.

A sample of data :

lat=rand(3000)*180-90
layers=randint(0,6,3000)
data=pd.DataFrame({'lat':lat,'layers':layers})

Make 18 groups :

data['groups'] = pd.cut(lat,linspace(-90,90,19))

And a table :

data.pivot_table(index='groups',columns='layers',aggfunc='count',fill_value=0)

           lat               
layers       0  1  2  3  4  5
groups                       
(-90, -80]   4  1  2  1  1  0
(-80, -70]   1  0  0  2  2  3
(-70, -60]   4  3  2  4  3  4
(-60, -50]   6  2  1  1  2  3
(-50, -40]   2  3  4  2  2  4
(-40, -30]   4  3  4  2  4  4
(-30, -20]   2  5  2  2  3  2
(-20, -10]   4  2  6  3  5  2
(-10, 0]     3  4  2  3  2  1
(0, 10]      5  3  4  3  4  7
(10, 20]     3  3  2  2  2  3
(20, 30]     2  1  1  4  3  5
(30, 40]     1  2  0  2  2  3
(40, 50]     1  3  3  2  3  4
(50, 60]     6  0  2  4  1  6
(60, 70]     3  3  2  5  1  5
(70, 80]     1  4  5  3  2  2
(80, 90]     1  7  3  2  4  2

Upvotes: 1

Uğur Güney
Uğur Güney

Reputation: 174

Does this work for you?

import pandas as pd
import random

# generate random data
N = 100
longitudes = [random.randint(-20, 89) for _ in xrange(N)]
layers = [random.randint(0, 5) for _ in xrange(N)]
data = pd.DataFrame({'LONGITUDE_CIRCLE_IMAGE': longitudes, 'NUMBER_LAYERS': layers})

def get_degree_group(longitude, mn=-20, mx=90, delta_deg=10):
    """calculate the layer from the given longitude"""
    return (longitude - mn) / delta_deg

def make_table(df):
    # make a new column by calculating the degree group from longitude column
    df['degree_group'] = df.LONGITUDE_CIRCLE_IMAGE.apply(get_degree_group)
    # count the number of craters with properties (deg_grp, num_lyr)
    s = df.groupby(['degree_group', 'NUMBER_LAYERS']).size()
    # s is a pandas Series where the index is in the form: (deg_grp, num_lyr)
    # and the values are the counts of crates in that category
    #
    # We want to convert the series into a table where num_lyr values are columns
    # This task is done with unstack method
    table = s.unstack('NUMBER_LAYERS')
    # there are some (deg_grp, num_lyr) cases for which there are no existing craters
    # Pandas put NaN for those cases. It might be better to put 0 into those cells
    table.fillna(0, inplace = True)
    return table

make_table(data)

Upvotes: 3

Related Questions