Reputation: 1844
I am trying to replace values in certain columns in a pandas dataframe. Because there are a number of changes to make, I’m approaching it with a for loop (though I am not wedded to this as the answer). I am only starting out with python, so huge apologies if this is obvious – I can’t find anything that seems to solve it.
Say I have a dataframe that is something like this:
import pandas as pd
weather_data = [["unknown", "rainy"], ["unknown", "sun"], ["rainy", "not sunny at all"], ["stormy", "a lot of rain"]]
weather = pd.DataFrame(weather_data, columns = ["weather", "weather_note"])
Where the weather data is unknown, I want to use text from the notes to fill in the data. For example, if it says “rain” then I want the weather value to be “rainy”, assuming it was previously unknown.
I have tried this:
weather_text = ["rain", "sun"]
weather_label = ["rainy", "sunny"]
for i in range(len(weather_text)):
weather.loc[weather['weather_note'].str.contains(weather_text[i], na = False) &
weather['weather'].str.contains("unknown")] = weather_label[i]
This changes every value in the row that meets the condition to whatever is in weather_label. I can see why it’s doing that, but I’m not sure how to only change the relevant column. I’ve tried this:
for i in range(len(weather_text)):
weather.loc[weather['weather_note'].str.contains(weather_text[i], na = False) &
weather['weather']str.contains("unknown")]
weather['weather'] = weather_label[i]
But then the value is changed to the last value in the weather_label list, not the one at the same index position.
In my real data, there are a lot more combinations of patterns and values so I am not keen to run every combination individually.
Can anybody help?
Upvotes: 0
Views: 214
Reputation: 38415
Assign value from weather_note if value in weather is 'unknown'. Replace words like sun with sunny using df.replace.
weather.loc[weather['weather'] == 'unknown', 'weather'] = weather['weather_note']
weather['weather'].replace('sun', 'sunny', inplace = True)
weather weather_note
0 rainy rainy
1 sunny sun
2 rainy not sunny at all
3 stormy a lot of rain
Upvotes: 0
Reputation: 2403
Here's how I'd do it. I used numpy in this code... hope that's okay. I just like numpy's vectorize method a lot. Pandas has an equivalent, but I don't tend to use it. The vectorize method (seen in last line of the code) is made for situations like this where you want to do *something* to a whole column, but it does it without requiring you to specify a loop in your code (it does the loop behind-the-scenes for you).
import pandas as pd
import numpy as np
weather_data = [["unknown", "rainy"], ["unknown", "sun"], ["rainy", "not sunny at all"], ["stormy", "a lot of rain"]]
weather = pd.DataFrame(weather_data, columns = ["weather", "weather_note"])
weather_indicators = {'rain': 'rainy',
'drizzle': 'rainy',
'sun': 'sunny',
'bright': 'sunny',
# add each pattern to this dictionary
}
def determine_weather(weather, weather_note):
output = weather
if weather == 'unknown':
for indicator in weather_indicators:
if indicator in weather_note:
output = weather_indicators[indicator]
return output
weather['weather'] = np.vectorize(determine_weather)(weather['weather'], weather['weather_note'])
I use a dictionary object named weather_indicators
to store the patterns. You can add more patterns to it. If the amount of patterns is very large (like hundreds), then perhaps consider storing them in some other object like a database table or csv file or something and then reading that into the code. You'll obviously have to rework the above code at that point, since that's out of scope of your question.
But basically I create a function that looks for a certain indicator word (e.g. "rain") and if the word is in the weather_note
value then I set the weather
column to be the specified value from the weather_indicator
dictionary object. Then apply the function to the weather
column of the data frame using numpy's vectorize function.
Upvotes: 1