Reputation: 4652
How do I turn the first column of this DataFrame
which is mixed strings and integers
df = pd.DataFrame(
[
["title1", "a", "b", "c", "d"],
[1, 2, 3, 4, 5],
[10, 2, 3, 4, 5],
[100, 2, 3, 4, 5],
["title2", "a", "b", "c", "d"],
[1, 2, 3, 4, 5],
[10, 2, 3, 4, 5],
[100, 2, 3, 4, 5],
["title3", "a", "b", "c", "d"],
[1, 2, 3, 4, 5],
[10, 2, 3, 4, 5],
[100, 2, 3, 4, 5],
]
)
looking like this
title1 a b c d
1 2 3 4 5
10 2 3 4 5
100 2 3 4 5
title2 a b c d
1 2 3 4 5
10 2 3 4 5
100 2 3 4 5
title3 a b c d
1 2 3 4 5
10 2 3 4 5
100 2 3 4 5
into a MultiIndex
with string in the top level and integers in the second?
a b c d
title1 1 2 3 4 5
10 2 3 4 5
100 2 3 4 5
title2 1 2 3 4 5
10 2 3 4 5
100 2 3 4 5
title3 1 2 3 4 5
10 2 3 4 5
100 2 3 4 5
Upvotes: 3
Views: 3746
Reputation: 2757
The key to this type of issues is to create a boolean series identifying the location of level_0 index,
mask = df[1].str.contains('a') # Identify the rows containing the level_0 multiindex
header = df.loc[0,1:4].to_list() # Get header list
df[-1] = df[0].where(mask).ffill() # Create a seperate level_0 column
result = (df[~mask.fillna(False)]
.set_index([-1,0])
.astype(int)
.rename_axis([None,None])
.set_axis(header,axis=1,inplace=False))
Upvotes: 1
Reputation: 862581
Use:
#get mask for distingusih strings values in column 0
m = pd.to_numeric(df[0], errors='coerce').isna()
#alternative
#m = ~df[0].astype(str).str.isnumeric()
#create new column 0 filled with strings
df.insert(0, 'a', df[0].where(m).ffill())
#mask for filter not same values in both columns
m1 = df['a'].ne(df[0])
#create MultiIndex
df = df.set_index(['a', 0])
#assign new columns names by first row
df.columns = df.iloc[0]
#filter out by mask and remove index, columns names
df = df[m1.values].rename_axis((None, None)).rename_axis(None, axis=1)
print (df)
a b c d
title1 1 2 3 4 5
10 2 3 4 5
100 2 3 4 5
title2 1 2 3 4 5
10 2 3 4 5
100 2 3 4 5
title3 1 2 3 4 5
10 2 3 4 5
100 2 3 4 5
Upvotes: 3