Reputation: 137
DataFrame I have prepared is as follows...
Index and Title | Index |
---|---|
1 aa aa aaaa | 1 |
1.2 bb bbbb bb bbbb bb b | 1.2 |
1.2.3 ccc cc c ccccc cccccc | 1.2.3 |
2 dddd d d dd ddd | 2 |
DataFrame I want is as follow..
Index and Title | Index | Title |
---|---|---|
1 aa aa aaaa | 1 | aa aa aaaa |
1.2 bb bbbb bb bbbb bb b | 1.2 | bb bbbb bb bbbb bb b |
1.2.3 ccc cc c ccccc cccccc | 1.2.3 | ccc cc c ccccc cccccc |
2 dddd d d dd ddd | 2 | dddd d d dd ddd |
I tried it with a following code
df['Title'] = df['Index and Title'].str.replace(df['Index'] + ' ','')
However, the debugger said ...
TypeError: 'Series' objects are mutable, thus they cannot be hashed
How should I do in this case?
Upvotes: 2
Views: 111
Reputation: 133680
With your shown samples only, this could be taken care by extract
function of Pandas, please try following.
df["Title"] = df["Index and Title"].str.extract(r'^\d+(?:(?:\.\d+){1,})?\s+(\D+)$', expand=True)
OR in case you may have digits after later values then try following:
df["Title"] = df["Index and Title"].str.extract(r'^\d+(?:(?:\.\d+){1,})?\s+(.*)$', expand=True)
Output of df
will be as follows:
Index and Title Index Title
0 1 aa aa aaaa 1 aa aa aaaa
1 1.2 bb bbbb bb bbbb bb b 1.2 bb bbbb bb bbbb bb b
2 1.2.3 ccc cc c ccccc cccccc 1.2.3 ccc cc c ccccc cccccc
3 2 dddd d d dd ddd 2 dddd d d dd ddd
Explanation: Adding detailed explanation for above.
^\d+(?:(?:\.\d+){1,})? ##Matching starting digits in column Index and Title, digits may followed by dot and digits(1 or more occurrences) keeping this optional.
\s+ ##Matching 1 or more occurrences of spaces here.
(\D+)$ ##Creating 1st capturing group which has all non digits values till end of value.
Upvotes: 2
Reputation: 120489
df["Title"] = df["Index and Title"].str.split(n=0).str[1:].str.join(" ")
>>> df
Index and Title Index Title
0 1 aa aa aaaa 1 aa aa aaaa
1 1.2 bb bbbb bb bbbb bb b 1.2 bb bbbb bb bbbb bb b
2 1.2.3 ccc cc c ccccc cccccc 1.2.3 ccc cc c ccccc cccccc
3 2 dddd d d dd ddd 2 dddd d d dd ddd
Upvotes: 2
Reputation: 863291
If need replace
by both columns use lambda function with axis=1
:
df['Title'] = df.apply(lambda x: x['Index and Title'].replace(x['Index'],''), axis=1).str.strip()
If need only letters with spaces (there is no replace by Index
column) use Series.str.extract
with Series.str.strip
:
df['Title'] = df['Index and Title'].str.extract('([a-zA-Z ]+)', expand=False).str.strip()
Upvotes: 0