Reputation: 115
I am new to python. I need to do a column calculation by using python which contains text_strings.
For example:
import pandas as pd
data = [1,2,'s','s',5,6,7,8,'s']
df = pd.DataFrame(data)
I want to create a new column by using .diff(). However, it cannot do calculation between int and str.
df.diff()
TypeError: unsupported operand type(s) for -: 'str' and 'int'
The new columns should like this:
obs new_col
0 1 na
1 2 1
2 s s
3 s s
4 5 5
5 6 1
6 7 1
7 8 1
8 s s
Does anyone know how to do this? Thanks! JH
Upvotes: 1
Views: 418
Reputation: 1001
I have created a custom function very similar to the Pandas diff()
function in python, which serves the same functionality for your use case.
import numpy as np
def diff(dataframe, col_name, new_col_name, periods=1):
# List which stores the values of the new columns
new_col_value = []
# Previous Value in the column
prev_value = None
# Periods counts for skipping
periods_count = 1
# Looping through the specified column
for i in range(len(dataframe[col_name])):
# Conditional for skipping the rows
if periods_count <= periods:
new_col_value.append(np.nan)
prev_value = dataframe[col_name][i]
periods_count += 1
# Conditional for checking the datatypes
# If the datatype is int
elif type(dataframe[col_name][i]) != str:
# If the previous value is a string
if (type(prev_value) == str):
prev_value = dataframe[col_name][i]
new_col_value.append(prev_value)
# If the previous value is int
else:
new_col_value.append(dataframe[col_name][i] - prev_value)
prev_value = dataframe[col_name][i]
# If the value is of string datatype
else:
prev_value = dataframe[col_name][i]
new_col_value.append(prev_value)
# Creating the new column in the dataframe
dataframe[new_col_name] = new_col_value
Upvotes: 1
Reputation: 75100
Use diff after converting to numeric, and then find diff and fillna, also since we know that only when the first diff can return nan, hardcode them:
df['new_col'] = pd.to_numeric(df[0],errors='coerce').diff().fillna(df[0])
df.loc[0,'new_col'] = np.nan
print(df)
0 new_col
0 1 NaN
1 2 1
2 s s
3 s s
4 5 5
5 6 1
6 7 1
7 8 1
8 s s
Upvotes: 2
Reputation: 1486
you can try this and modify this
df['new_col']= df['obs'].shift(-1) #creating a sample column for the difference
def calc(x):
if type(x["obs"])== int and type(x['new_col'])== int:
return x['obs'] - x['obs']
else:
return x['obs']
a.apply(test, axis=1)
Upvotes: 1