Reputation: 1288
I felt like I found the answer to this before, but looking back I haven't been able to find anything.
Is there a quick, painless way to split strings in a specific series in a dataframe?
For example, the series df['a']
looks like this:
df['a'] = ['abc 123', 'bcd 2344456jlkj6', 'dfe 456jklj34534', 'akg bg23534535']
What I want at the end is just
df['a'] = ['abc', 'bcd', 'dfe', 'akg']
I originally tried using df['a'] = df['a'].str.split(' ')[0]
but that just gave me index errors.
Upvotes: 0
Views: 69
Reputation: 8786
This should work for you:
df = pd.DataFrame({"a": ['abc 123', 'bcd 2344456jlkj6', 'dfe 456jklj34534', 'akg bg23534535']})
print df['a']
df2 = []
for num in df['a']:
df2.append(num.split(' ')[0])
df['a'] = df2
print df['a']
Which yields:
0 abc 123
1 bcd 2344456jlkj6
2 dfe 456jklj34534
3 akg bg23534535
Name: a, dtype: object
0 abc
1 bcd
2 dfe
3 akg
Name: a, dtype: object
Upvotes: 0
Reputation: 353099
You were very close, you simply need an extra str
in there:
>>> df = pd.DataFrame({"a": ['abc 123', 'bcd 2344456jlkj6', 'dfe 456jklj34534', 'akg bg23534535']})
>>> df["a"].str.split().str[0]
0 abc
1 bcd
2 dfe
3 akg
Name: a, dtype: object
Upvotes: 2
Reputation: 879621
In [158]: df
Out[158]:
a
0 abc 123
1 bcd 2344456jlkj6
2 dfe 456jklj34534
3 akg bg23534535
In [159]: df['a'].str.extract(r'^(\w+)')
Out[159]:
0 abc
1 bcd
2 dfe
3 akg
Name: a, dtype: object
Upvotes: 0