Reputation: 1679
I have a dataframe that looks like this:
length code1 code2 code3
4 0 1 1
8 1 1 0
7 1 0 0
I want to write a function that checks the value in length. If the value is >= 7, I want to add 1 to the value present in code2 and code3. What is the best way to do this? So far, I have:
def char_count_pred(df):
if df.length >= 7:
df.code2 += 1
df.code3 += 1
return df
master_df = char_count_pred(master_df)
I understand I need to build a loop to iterate over each row, but I am confused on the most efficient way to loop through rows of and performing tasks on multiple columns.
When trying the solutions below, I get the same errors:
When I try the script as is....
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
C:\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in get_loc(self, key, method, tolerance)
2889 try:
-> 2890 return self._engine.get_loc(key)
2891 except KeyError:
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\index.pyx in pandas._libs.index.IndexEngine.get_loc()
pandas\_libs\index_class_helper.pxi in pandas._libs.index.Int64Engine._check_type()
KeyError: True
When I set the script to = a variable...
File "<ipython-input-140-9f2f40a5bb96>", line 3
df = df.loc[df.length>=7]+=1
^
SyntaxError: invalid syntax
Upvotes: 0
Views: 80
Reputation: 1085
I hope this will help:
EDIT
This is everything, which i am using:
Definition of your datafrme:
df = pd.DataFrame(columns=["length","code1","code2","code3"],
data=[[4,0,1,1],
[8,1,1,0],
[7,1,0,0]])
Definition of the function:
def char_count_pred(df):
for col in df.columns:
df[col].loc[df[col]>7]+=1
char_count_pred(df)
Everything works, I don't know, where the problem is.
Upvotes: 1
Reputation: 11067
You could perform an apply
which would likely be close to the most efficient way of modifying your columns based on some function.
There's an answer here you could take a look at, or - try something like this as a template for your specific use-case:
master_df["code2"] = master_df.apply(lambda x : x["code2"] + 1 if x["length"] >= 7 else x["code2"], axis=1)
Which will update your "code2" field by applying a function (in this case an anonymous lamba
function, but could equally be some named function as per your def) the only limitation being that it's simpler if those functions target a single column at a time.
There are methods for updating/generating results to update multiple columns at once, but it might be simpler to start of updating single columns at a time.
Upvotes: 0
Reputation: 2579
df.loc[df['length']>=7, 'code2':] += 1
Use .loc to search for rows greater than or equal to 7, then select the correct columns and add 1
Upvotes: 2