Reputation: 695
d1 = {'id': ['a','b','c'], 'ref': ['apple','orange','banana'], 'value':[1,2,3]}
df1 = pd.DataFrame(d1)
d2 = {'id': ['a','b','c'], 'ref': ['apple','orange','banana'], 'value':[1,2,3]}
df2 = pd.DataFrame(d2)
for df in [df1, df2]:
df = df[df['id'] == 'a']
I have a list of dataframes I'd like to run the same operations on. The loop works, however, the outputs aren't as desired. Above, I'm just running a simple filter, but the changes aren't saved... how can I fix this?
UPDATE - Also tried looping through dict and did not work:
df_dict = {'df1':df1,'df2':df2}
for df in df_dict.keys():
df_dict[df] = df_dict[df][df_dict[df]['id'] == 'a']
Upvotes: 1
Views: 131
Reputation: 144
You are assigning df a value of the Dataframe by calling the pd.Dataframe function, therefore the df can never equal anything that is not the Dataframe.
Your code calls to reassign a value to df only representative of a column for id. Perhaps only reassigning the id value would be more optimal for what you are looking for:
d1 = {'id': ['a','b','c'], 'ref': ['apple','orange','banana'],'value':[1,2,3]}
df1 = pd.DataFrame(d1)
d2 = {'id': ['a','b','c'], 'ref': ['apple','orange','banana'],'value':[1,2,3]}
df2 = pd.DataFrame(d2)
for df in [df1, df2]:
df['id'] == 'a'
Let me know if this is what you are looking for.
Upvotes: 1
Reputation: 2967
Why does your loop not assign to df1, df2?
The answer has to do with for
loop semantics.
A typical for
loop (documentation)
for x in expression_list:
#statements
makes assignments to the name/variable x
to the objects returned by the iterable that expression_list
evaluates to. If x
was a variable before the for
, then the assignments in the for
overwrite what it was previously and will persist beyond the for
. If x
was not a variable before the for
, essentially, in the first iteration of the loop, a new name/variable x
is introduced into the program, originally pointing to the first object returned by the iterable; this name will still persist - after the for
, x
will be a variable that refers to the last assigned-to object in the for
.
Your code
for df in [df1, df2]:
df = df[df['id'] == 'a']
print(df)
with output
id ref value
0 a apple 1
id ref value
0 a apple 1
does not have your desired effect of assigning to df1
and df2
because in the first iteration of the loop, df
is a name that refers to the object associated with the name df1
. Once you modify df
(assign it to another object), all you have done is made df
refer to another object. Similarly for the second iteration of the loop.
As is clear by my addition of the print
statements, your right-hand-side of the assignment is evaluating to your desired result, but it is assigning it to the name df
, not the names df1
and df2
!
After the start of the for
loop, df
is a name (or variable) just like any other variable. If you print df
after the for
loop you will get
id ref value
0 a apple 1
which is the last value that was assigned to df
(i.e. from the last iteration of the loop).
Potential Solution
One way to do what you want is to use the DataFrame
s as values to a dictionary.
import pandas as pd
# data
d1 = {'id': ['a','b','c'], 'ref': ['apple','orange','banana'], 'value':[1,2,3]}
d2 = {'id': ['a','b','c'], 'ref': ['apple','orange','banana'], 'value':[1,2,3]}
# dict storing DataFrames
d = {'df1': pd.DataFrame(d1), 'df2': pd.DataFrame(d2)}
for key in d:
d[key] = d[key][d[key]['id'] == 'a']
print(d['df1'])
print(d['df2'])
Output
id ref value
0 a apple 1
id ref value
0 a apple 1
Upvotes: 1
Reputation: 1703
You can use drop()
with inplace=True
to filter the original pd.DataFrame
, which was defined previously.
The reason why the solution works, is that drop can edit the original df by setting inplace=True. In your solution you work on a copy of the df, but do not alter the original df1 or df2.
import pandas as pd
d1 = {'id': ['a','b','c'], 'ref': ['apple','orange','banana'], 'value':[1,2,3]}
df1 = pd.DataFrame(d1)
d2 = {'id': ['a','b','c'], 'ref': ['apple','orange','banana'], 'value':[1,2,3]}
df2 = pd.DataFrame(d2)
for df in [df1, df2]:
df.drop(df[df['id'] != 'a'].index,inplace=True)
Output:
id ref value
0 a apple 1
Upvotes: 1
Reputation: 2787
Assuming your dataframes are all going to be "df" + i
:
for i, df in enumerate([df1, df2]):
df.name = "df" + str(i+1)
globals()[df.name] = df[df['id'] == 'a']
Upvotes: 1