Jean Hu
Jean Hu

Reputation: 115

python do columns calculation when column contains text string

I am new to python. I need to do a column calculation by using python which contains text_strings.

For example:

import pandas as pd
data = [1,2,'s','s',5,6,7,8,'s']
df = pd.DataFrame(data)

I want to create a new column by using .diff(). However, it cannot do calculation between int and str.

df.diff()    
TypeError: unsupported operand type(s) for -: 'str' and 'int'

The new columns should like this:

obs       new_col
    0   1  na
    1   2  1
    2   s  s
    3   s  s
    4   5  5
    5   6  1
    6   7  1
    7   8  1
    8   s  s

Does anyone know how to do this? Thanks! JH

Upvotes: 1

Views: 418

Answers (3)

Kishore Sampath
Kishore Sampath

Reputation: 1001

I have created a custom function very similar to the Pandas diff() function in python, which serves the same functionality for your use case.

import numpy as np

def diff(dataframe, col_name, new_col_name, periods=1):
    # List which stores the values of the new columns
    new_col_value = []
    
    # Previous Value in the column
    prev_value = None
    
    # Periods counts for skipping
    periods_count = 1
    
    # Looping through the specified column
    for i in range(len(dataframe[col_name])):
        
        # Conditional for skipping the rows
        if periods_count <= periods:
            new_col_value.append(np.nan)
            prev_value = dataframe[col_name][i]
            periods_count += 1
            
        # Conditional for checking the datatypes
        # If the datatype is int
        elif type(dataframe[col_name][i]) != str:
            # If the previous value is a string
            if (type(prev_value) == str):
                prev_value = dataframe[col_name][i]
                new_col_value.append(prev_value)
                
            # If the previous value is int
            else:
                new_col_value.append(dataframe[col_name][i] - prev_value)
                prev_value = dataframe[col_name][i]
        
        # If the value is of string datatype
        else:
            prev_value = dataframe[col_name][i]            
            new_col_value.append(prev_value)
    
    # Creating the new column in the dataframe
    dataframe[new_col_name] = new_col_value

Upvotes: 1

anky
anky

Reputation: 75100

Use diff after converting to numeric, and then find diff and fillna, also since we know that only when the first diff can return nan, hardcode them:

df['new_col'] = pd.to_numeric(df[0],errors='coerce').diff().fillna(df[0])
df.loc[0,'new_col'] = np.nan

print(df)

   0 new_col
0  1     NaN
1  2       1
2  s       s
3  s       s
4  5       5
5  6       1
6  7       1
7  8       1
8  s       s

Upvotes: 2

Ade_1
Ade_1

Reputation: 1486

you can try this and modify this

df['new_col']= df['obs'].shift(-1) #creating a sample column for the difference

def calc(x):
    if type(x["obs"])== int and type(x['new_col'])== int:
        return x['obs'] - x['obs']
    else:
        return x['obs']

a.apply(test, axis=1)

Upvotes: 1

Related Questions