mpy
mpy

Reputation: 632

how to slice a dataframe based on a specific values in a column in an iterative way and create a new dataframe?

I have a pandas dataframe and I want to iterate over rows of this dataframe, get slices of data, based on a value in a column.

To say it more brief, I have a dataframe like below:

districts = [['dist','name','sale','purchase'],['dis1','avelin',2300, 1400],['dis2','matri', 4300, 2500], ['dis1', 'texi', 1500, 1700],['dis2','timi', 2300, 1400]]

I'd like to iterate over all rows and extract dataframes based on 'dist' column.
the output should look like below:

dis1 = [[2300, 1400], [1500,1700]]
dis2 = [[4300,2500],[2300,1400]]  

Upvotes: 2

Views: 3828

Answers (1)

Brendan
Brendan

Reputation: 4011

As a preface, you aren't really working with pandas as you currently have your code set up. You have a list of lists, but it is not a pandas dataframe. To actually work with pandas:

districts = [['dis1','avelin',2300, 1400],
             ['dis2','matri', 4300, 2500],
             ['dis1', 'texi', 1500, 1700],
             ['dis2','timi', 2300, 1400]]
df = pd.DataFrame(data=districts, columns=['dist','name','sale','purchase'])

From there, the process of subsetting data frames is easy -- 'iteration' is not needed (and rarely is when working with pandas):

dis1 = df.loc[df['dist'] == 'dis1']
dis2 = df.loc[df['dist'] == 'dis2']

This gives the result:

   dist    name  sale  purchase
0  dis1  avelin  2300      1400
2  dis1    texi  1500      1700

   dist   name  sale  purchase
1  dis2  matri  4300      2500
3  dis2   timi  2300      1400

If you haven't already, you should read through the pandas help pages -- e.g., the Getting Started and Indexing and Selecting Data pages.

Upvotes: 1

Related Questions