Reputation: 57
I have a DataFrame from a source where the names are repeated back to back without a delimiter to split upon.
Example:
In [1]
data = {"Names": ["JakeJake", "ThomasThomas", "HarryHarry"],
"Scores": [70, 81, 23]}
df = pd.DataFrame(data)
Out [1]
Names Scores
0 JakeJake 70
1 ThomasThomas 81
2 HarryHarry 23
I would like a method to keep just the first half of the 'Names' column. My initial thought was to do the following:
In [2]
df["N"] = df["Names"].str.len()//2
df["X"] = df["Names"].str[:df["N"]]
However this gives the output
Out [2]
Names Scores N X
0 JakeJake 70 4 nan
1 ThomasThomas 81 6 nan
2 HarryHarry 23 5 nan
The desired output would be
Out [2]
Names Scores N X
0 JakeJake 70 4 Jake
1 ThomasThomas 81 6 Thomas
2 HarryHarry 23 5 Harry
I'm sure the answer will be something simple but I can't get my head around it. Cheers
Upvotes: 1
Views: 2203
Reputation: 23217
You can use .map()
on column Names
, as follows:
df['X'] = df['Names'].map(lambda x: x[:len(x)//2])
Result:
print(df)
Names Scores X
0 JakeJake 70 Jake
1 ThomasThomas 81 Thomas
2 HarryHarry 23 Harry
Upvotes: 2
Reputation: 23099
use a regex to split the camel case, we can set the rule to split any uppercase letter that is immediately followed by a lower case letter
n = df['Names'].str.split('(?<=[a-z])(?=[A-Z])',expand=True)[0]
df['N'], df['X'] = n, n.str.len()
print(df)
Names Scores N X
0 JakeJake 70 Jake 4
1 ThomasThomas 81 Thomas 6
2 HarryHarry 23 Harry 5
Upvotes: 0
Reputation: 18315
With a regex to extract names and str.len
for the lengths:
df["X"] = df.Names.str.extract(r"^(.+)\1$")
df["N"] = df.X.str.len()
where regex looks for a fullmatch of anything repeated 2 times (\1
refers to the first capturing group within the regex).
>>> df
Names Scores X N
0 JakeJake 70 Jake 4
1 ThomasThomas 81 Thomas 6
2 HarryHarry 23 Harry 5
Upvotes: 2
Reputation: 18426
You can use apply
on Names
column, then take only the part of the required string.
>>> df.assign(x=df['Names'].apply(lambda x: x[:len(x)//2]))
Names Scores x
0 JakeJake 70 Jake
1 ThomasThomas 81 Thomas
2 HarryHarry 23 Harry
Upvotes: 2