Reputation: 11
I am pretty new to Python programming and have a question about replacing certain conditional number in a DataFrame. for example, I have a dateframe with 5 days of data in each column, day1, day2, day3, day4 and day5. For each day, I have 5 data points with some of them larger than 5 for each day. Now I want to set the data which is larger than 5 to 1. So how can I do that? Loop into each column and find specific element then change it, or there is other faster way to do it? Thanks,
Upvotes: 1
Views: 2138
Reputation: 3493
This will iterate over the data in each column and change high values to 1. Iterating by rows instead of columns is an option with iterrows
as discussed here, but it's generally slower.
import pandas as pd
data = {'day1' : pd.Series([1, 2, 3]),
'day2' : pd.Series([1, 4, 6]),
'day3' : pd.Series([5, 4, 3]),
'day4' : pd.Series([2, 4, 6]),
'day5' : pd.Series([7, 3, 2])}
df = pd.DataFrame(data)
for col in df.columns:
df[col] = [x if x <= 5 else 1 for x in df[col]]
Upvotes: 0
Reputation: 936
To do this without looping (which is usually faster) you can do:
df[df > 5] = 1
Upvotes: 1