beginner
beginner

Reputation: 25

reduce lat lon points

I have a dataframe with large number of lat/lon points (305000). I want to reduce the size of my dataframe by taking, each iteration, a sample and calculate the haversine distance between each consecutive rows. If the distance is too small I want to delete one of the two points. How can I do this in python? I wanted to use shift() but I don't know the wright way to use it. This what I am trying to do.

rows=random.sample(df.index,50)

for i in range(50):

    rows = np.random.choice(df.index.values, 1000)

    sampled_df = df.ix[rows]

    if haversine(sampled_df,sampled_df.shift()) < e

        delete one row

Upvotes: 1

Views: 212

Answers (2)

Back2Basics
Back2Basics

Reputation: 7806

The big questions are "why you would want to do that?" and "what would it gain you once you are finished?" (besides speed). The problem with your approach is deciding which of the 2+ to delete. The answer to how to approach this lies in the big questions. I would suggest one of a few approaches. Do you want to be left with a center point? a representative point?

A few implementation suggestions: Use a groupby or mask instead of deleting data. For speed reasons: try to avoid using for statements in Pandas.

Upvotes: -1

ccbunney
ccbunney

Reputation: 2762

How about using a masked array and setting the mask value to true for each point you remove?

Upvotes: 1

Related Questions