Pandas split column name

Question

I have a test dataframe that looks something like this:

data = pd.DataFrame([[0,0,0,3,6,5,6,1],[1,1,1,3,4,5,2,0],[2,1,0,3,6,5,6,1],[3,0,0,2,9,4,2,1]], columns=["id", "sex", "split", "group0Low", "group0High", "group1Low", "group1High", "trim"])

grouped = data.groupby(['sex','split']).mean()

stacked = grouped.stack().reset_index(level=2)
stacked.columns = ['group_level', 'mean']

Next, I want to separate out group_level and stack those 2 new factors:

stacked['group'] = stacked.group_level.str[:6]
stacked['level'] = stacked.group_level.str[6:]

This all works fine. My question is this:

This works if my column names ("group0Low", "group0High", "group1Low", "group1High") have something in common with each other.

What if instead my column names were more like "routeLow", "routeHigh", "landmarkLow", "landmarkHigh"? How would I use str to split group_level in this case?

This question is similar to this one posted here: Slice/split string Series at various positions

The difference is all of my column subnames are different and have no commonality (whereas in the other post everything had group or class in the name). Is there a regex string, or some other method, I can use to do this stacking?

Jianxun Li · Accepted Answer

Here is another way. It assumes that low/high group ends with the words Low and High respectively, so that we can use .str.endswith() to identify which rows are Low/High.

Here is the sample data

df = pd.DataFrame('group0Low group0High group1Low group1High routeLow routeHigh landmarkLow landmarkHigh'.split(), columns=['group_level'])
df

    group_level
0     group0Low
1    group0High
2     group1Low
3    group1High
4      routeLow
5     routeHigh
6   landmarkLow
7  landmarkHigh

Use np.where, we can do the following

df['level'] = np.where(df['group_level'].str.endswith('Low'), 'Low', 'High')
df['group'] = np.where(df['group_level'].str.endswith('Low'), df['group_level'].str[:-3], df['group_level'].str[:-4])

df

    group_level level     group
0     group0Low   Low    group0
1    group0High  High    group0
2     group1Low   Low    group1
3    group1High  High    group1
4      routeLow   Low     route
5     routeHigh  High     route
6   landmarkLow   Low  landmark
7  landmarkHigh  High  landmark

Pandas split column name

Answers (2)

Related Questions