Reputation: 57
I am trying to create a new dataframe based on some criteria based on an original dataframe.
df = pandas.io.sql.read_sql(sql, conn)
Count_Row = df.shape[0]
for j in range(Count_Row - 1):
if df.iloc[j, 0] == df.iloc[j + 1, 0]:
print(df.iloc[j, 2] + df.iloc[j + 1, 2], df.iloc[j, 4], df.iloc[j, 6], df.iloc[j, 3])
However instead of printing I want to add that data to a new dataframe.
How is this possible?
Upvotes: 2
Views: 153
Reputation: 4465
Instead of printing out the data you can append it to a new data frame
import pandas as pd
df = pandas.io.sql.read_sql(sql, conn)
Count_Row = df.shape[0]
results = pd.DataFrame() # create data frame to store results
for j in range(Count_Row - 1):
if df.iloc[j, 0] == df.iloc[j + 1, 0]:
# create row of values to append
row = pd.Series([df.iloc[j, 2] + df.iloc[j + 1, 2],
df.iloc[j, 4],
df.iloc[j, 6],
df.iloc[j, 3]])
results = results.append([row])
results.columns = ['v1', 'v2', 'v3', 'v4'] # the variables
This will give you a data frame with the desired output
Upvotes: 1
Reputation: 249103
Don't use a slow "for" loop to do this. Instead, generate a mask which is True for the elements you want, then select those elements:
matches = df.iloc[:-1,0] == df.iloc[1:,0]
new_df = df.iloc[:-1][matches]
This will be 10-100x faster than the approach you had before.
At the end, new_df
will contain copies of the selected rows.
[:-1]
means "all elements before the last one."
Upvotes: 1