Hannah
Hannah

Reputation: 57

Python DataFrame Copy

I am trying to create a new dataframe based on some criteria based on an original dataframe.

df = pandas.io.sql.read_sql(sql, conn)

Count_Row = df.shape[0]
for j in range(Count_Row - 1):

    if df.iloc[j, 0] == df.iloc[j + 1, 0]:
        print(df.iloc[j, 2] + df.iloc[j + 1, 2], df.iloc[j, 4], df.iloc[j, 6], df.iloc[j, 3])

However instead of printing I want to add that data to a new dataframe.

How is this possible?

Upvotes: 2

Views: 153

Answers (2)

rgalbo
rgalbo

Reputation: 4465

Instead of printing out the data you can append it to a new data frame

import pandas as pd

df = pandas.io.sql.read_sql(sql, conn)
Count_Row = df.shape[0]

results = pd.DataFrame() # create data frame to store results

for j in range(Count_Row - 1):
    if df.iloc[j, 0] == df.iloc[j + 1, 0]:
        # create row of values to append
        row = pd.Series([df.iloc[j, 2] + df.iloc[j + 1, 2], 
                        df.iloc[j, 4], 
                        df.iloc[j, 6], 
                        df.iloc[j, 3]])
        results = results.append([row])

results.columns = ['v1', 'v2', 'v3', 'v4'] # the variables

This will give you a data frame with the desired output

Upvotes: 1

John Zwinck
John Zwinck

Reputation: 249103

Don't use a slow "for" loop to do this. Instead, generate a mask which is True for the elements you want, then select those elements:

matches = df.iloc[:-1,0] == df.iloc[1:,0]
new_df = df.iloc[:-1][matches]

This will be 10-100x faster than the approach you had before.

At the end, new_df will contain copies of the selected rows.

[:-1] means "all elements before the last one."

Upvotes: 1

Related Questions