Himanshu Gautam
Himanshu Gautam

Reputation: 399

Dynamic index String slicing in Pandas Dataframe

I have a dataframe column with either a single integer value or a range of two integers. I want to create a final column where the range values are averaged, so that the column become purely integer.

I am trying to use pandas.str.find("-") to find the location of break point and then use np.where() to create separate columns for First Value and Second Value in Range.

import numpy as np
import pandas as pd

d = {'A' : ['1234', '12 - 16'], 'Avg':[1234, 14]}
df= pd.DataFrame(d)

df['bp'] = df['A'].str.find("-")
df['F'] = np.where(df['bp']>0, df['A'].str.slice(0, df['bp']), df['A'])

I am getting NAN where range is present in column. Expected Output is in Column "Avg".

Upvotes: 1

Views: 1378

Answers (3)

Andy L.
Andy L.

Reputation: 25259

Use extractall and call mean directly on level=0

df.A.str.extractall(r'(\d+)').astype(int).mean(level=0)[0]

Out[64]:
0    1234
1      14
Name: 0, dtype: int32

Upvotes: 0

Dev Khadka
Dev Khadka

Reputation: 5451

you can do it using vectorization (with out apply function like below) using str function and explode (pandas above 0.25)

your index must be unique or you need to call df.reset_index for this to work

import pandas as pd

d = {'A' : ['1234', '12 - 16'], 'Avg':[1234, 14]}
df= pd.DataFrame(d)
df["A"].str.split("-").explode().astype(pd.np.int).groupby(level=0).mean()

Upvotes: 0

rafaelc
rafaelc

Reputation: 59274

Using str.split

df['A'].str.split(' - ').apply(lambda s: sum(map(int,s))/len(s),1)

0    1234.0
1      14.0
Name: A, dtype: float64

Upvotes: 2

Related Questions