Reputation: 239
I use pandas and I have data and the data look like this
FirstName LastName StudentID
FirstName2 LastName2 StudentID2
Then I split it based on 'space' using str.split()
So the data will look like this in DataFrame
[[FirstName, LastName, StudentID],
[FirstName2, LastName2, StudentID2]]
How to take the StudentID for every students only and save it in new column?
Upvotes: 21
Views: 75930
Reputation: 56875
For OP's particular case, another approach is using .extract()
and capturing the desired chunk of the string, without creating an intermediate list:
>>> import pandas as pd # 2.0.3
>>> df = pd.DataFrame({"a": ["aaa bb cc", "xx y zzzz"]})
>>> df["a"].str.extract(r"(\S+)$")
0
0 cc
1 zzzz
The regex (\S+)$
grabs one or more non-space characters at the end of the string. (\S+)\s*$
is useful if there is trailing whitespace.
This answer shows the intermediate list idea nicely, but you can use an index rather than .get()
:
>>> df["a"].str.split().str[-1]
0 cc
1 zzzz
Name: a, dtype: object
See also Selecting the last element of a list inside a pandas dataframe
Upvotes: 1
Reputation: 96
I thought I would add this simple solution which doesn't use lists or list comprehension to split an existing column/series and store the last item from the split to a new column/series in the DataFrame
import pandas as pd
data = ['FirstName LastName StudentID',
'FirstName2 LastName2 StudentID2']
df = pd.DataFrame(data=data, columns=['text'])
df['id'] = df.text.str.split(" ").str.get(-1)
Output:
index text id
0 FirstName LastName StudentID StudentID
0 FirstName2 LastName2 StudentID2 StudentID2
Upvotes: 4
Reputation: 323226
Using data frame constructor
pd.DataFrame(df.text.str.split(' ').tolist()).iloc[:,0]
Out[15]:
0 FirstName
1 FirstName2
Name: 0, dtype: object
Upvotes: 0
Reputation: 61910
You could do something like this:
import pandas as pd
data = ['FirstName LastName StudentID',
'FirstName2 LastName2 StudentID2']
df = pd.DataFrame(data=data, columns=['text'])
df['id'] = df.text.apply(lambda x: x.split()[-1])
print(df)
Output
text id
0 FirstName LastName StudentID StudentID
1 FirstName2 LastName2 StudentID2 StudentID2
Or, as an alternative:
df['id'] = [x.split()[-1] for x in df.text]
print(df)
Output
text id
0 FirstName LastName StudentID StudentID
1 FirstName2 LastName2 StudentID2 StudentID2
Upvotes: 1
Reputation: 11224
Why not try a simple list comprehension
students = [
["FirstName", "LastName", "StudentID"],
["FirstName2", "LastName2", "StudentID2"]
]
print([student[2] for student in students])
which will print
['StudentID', 'StudentID2']
Upvotes: -2
Reputation: 2843
Use a list comprehension to take the last element of each of the split strings:
ids = [val[-1] for val in your_string.split()]
Upvotes: 8