Reputation: 21
I have this dataset that represents information on Mars craters. I am only interested in the latitude, longitude, and the number of visible layers columns. I am trying to get the latitude into groups of 10 degrees: -90 to -80, -80 to -70, etc. while taking the layer column values (0,1,2,3,4,5) and turning them into columns themselves to get a table of the value_counts of each layer column value for each 10 degree group.
Pulling my hair out on this I seemingly tried everything I could understand.
Upvotes: 2
Views: 218
Reputation: 18628
use pd.cut
to make groups, and pivot_table
to count.
A sample of data :
lat=rand(3000)*180-90
layers=randint(0,6,3000)
data=pd.DataFrame({'lat':lat,'layers':layers})
Make 18 groups :
data['groups'] = pd.cut(lat,linspace(-90,90,19))
And a table :
data.pivot_table(index='groups',columns='layers',aggfunc='count',fill_value=0)
lat
layers 0 1 2 3 4 5
groups
(-90, -80] 4 1 2 1 1 0
(-80, -70] 1 0 0 2 2 3
(-70, -60] 4 3 2 4 3 4
(-60, -50] 6 2 1 1 2 3
(-50, -40] 2 3 4 2 2 4
(-40, -30] 4 3 4 2 4 4
(-30, -20] 2 5 2 2 3 2
(-20, -10] 4 2 6 3 5 2
(-10, 0] 3 4 2 3 2 1
(0, 10] 5 3 4 3 4 7
(10, 20] 3 3 2 2 2 3
(20, 30] 2 1 1 4 3 5
(30, 40] 1 2 0 2 2 3
(40, 50] 1 3 3 2 3 4
(50, 60] 6 0 2 4 1 6
(60, 70] 3 3 2 5 1 5
(70, 80] 1 4 5 3 2 2
(80, 90] 1 7 3 2 4 2
Upvotes: 1
Reputation: 174
Does this work for you?
import pandas as pd
import random
# generate random data
N = 100
longitudes = [random.randint(-20, 89) for _ in xrange(N)]
layers = [random.randint(0, 5) for _ in xrange(N)]
data = pd.DataFrame({'LONGITUDE_CIRCLE_IMAGE': longitudes, 'NUMBER_LAYERS': layers})
def get_degree_group(longitude, mn=-20, mx=90, delta_deg=10):
"""calculate the layer from the given longitude"""
return (longitude - mn) / delta_deg
def make_table(df):
# make a new column by calculating the degree group from longitude column
df['degree_group'] = df.LONGITUDE_CIRCLE_IMAGE.apply(get_degree_group)
# count the number of craters with properties (deg_grp, num_lyr)
s = df.groupby(['degree_group', 'NUMBER_LAYERS']).size()
# s is a pandas Series where the index is in the form: (deg_grp, num_lyr)
# and the values are the counts of crates in that category
#
# We want to convert the series into a table where num_lyr values are columns
# This task is done with unstack method
table = s.unstack('NUMBER_LAYERS')
# there are some (deg_grp, num_lyr) cases for which there are no existing craters
# Pandas put NaN for those cases. It might be better to put 0 into those cells
table.fillna(0, inplace = True)
return table
make_table(data)
Upvotes: 3