Jaccar
Jaccar

Reputation: 1844

Change column values conditionally and repeat several times

I am trying to replace values in certain columns in a pandas dataframe. Because there are a number of changes to make, I’m approaching it with a for loop (though I am not wedded to this as the answer). I am only starting out with python, so huge apologies if this is obvious – I can’t find anything that seems to solve it.

Say I have a dataframe that is something like this:

import pandas as pd

weather_data = [["unknown", "rainy"], ["unknown", "sun"], ["rainy", "not sunny at all"], ["stormy", "a lot of rain"]]
weather = pd.DataFrame(weather_data, columns = ["weather", "weather_note"])

Where the weather data is unknown, I want to use text from the notes to fill in the data. For example, if it says “rain” then I want the weather value to be “rainy”, assuming it was previously unknown.

I have tried this:

weather_text = ["rain", "sun"]
weather_label = ["rainy", "sunny"]

for i in range(len(weather_text)):
    weather.loc[weather['weather_note'].str.contains(weather_text[i], na = False) & 
               weather['weather'].str.contains("unknown")] = weather_label[i]

This changes every value in the row that meets the condition to whatever is in weather_label. I can see why it’s doing that, but I’m not sure how to only change the relevant column. I’ve tried this:

for i in range(len(weather_text)):
    weather.loc[weather['weather_note'].str.contains(weather_text[i], na = False) & 
               weather['weather']str.contains("unknown")]
    weather['weather'] = weather_label[i]

But then the value is changed to the last value in the weather_label list, not the one at the same index position.

In my real data, there are a lot more combinations of patterns and values so I am not keen to run every combination individually.

Can anybody help?

Upvotes: 0

Views: 214

Answers (2)

Vaishali
Vaishali

Reputation: 38415

Assign value from weather_note if value in weather is 'unknown'. Replace words like sun with sunny using df.replace.

weather.loc[weather['weather'] == 'unknown', 'weather'] = weather['weather_note']
weather['weather'].replace('sun', 'sunny', inplace = True)

    weather weather_note
0   rainy   rainy
1   sunny   sun
2   rainy   not sunny at all
3   stormy  a lot of rain

Upvotes: 0

LetEpsilonBeLessThanZero
LetEpsilonBeLessThanZero

Reputation: 2403

Here's how I'd do it. I used numpy in this code... hope that's okay. I just like numpy's vectorize method a lot. Pandas has an equivalent, but I don't tend to use it. The vectorize method (seen in last line of the code) is made for situations like this where you want to do *something* to a whole column, but it does it without requiring you to specify a loop in your code (it does the loop behind-the-scenes for you).

import pandas as pd
import numpy as np

weather_data = [["unknown", "rainy"], ["unknown", "sun"], ["rainy", "not sunny at all"], ["stormy", "a lot of rain"]]
weather = pd.DataFrame(weather_data, columns = ["weather", "weather_note"])

weather_indicators = {'rain': 'rainy',
                      'drizzle': 'rainy',
                      'sun': 'sunny',
                      'bright': 'sunny',
                      # add each pattern to this dictionary
                      }

def determine_weather(weather, weather_note):
    output = weather
    if weather == 'unknown':
        for indicator in weather_indicators:
            if indicator in weather_note:
                output = weather_indicators[indicator]
    return output


weather['weather'] = np.vectorize(determine_weather)(weather['weather'], weather['weather_note'])

I use a dictionary object named weather_indicators to store the patterns. You can add more patterns to it. If the amount of patterns is very large (like hundreds), then perhaps consider storing them in some other object like a database table or csv file or something and then reading that into the code. You'll obviously have to rework the above code at that point, since that's out of scope of your question.

But basically I create a function that looks for a certain indicator word (e.g. "rain") and if the word is in the weather_note value then I set the weather column to be the specified value from the weather_indicator dictionary object. Then apply the function to the weather column of the data frame using numpy's vectorize function.

Upvotes: 1

Related Questions