Reputation: 55
I am new to Python and was trying to do some stuff to do hands on on it.
While doing this I am stuck here.
I have a data in .csv format which I imported to python using
data = pandas.read_csv("data.csv")
data.head()
user rating id
0 1 3.5 1_1193
1 1 3.5 1_661
2 1 3.5 1_914
3 1 3.5 1_3408
4 1 3.5 1_2355
What I need is from the 'id' column I should get the number which is after '_'.
What I have tried doing is:
data.id.split('_')
which gave me error: "'DataFrame' object has no attribute 'split'"
Hence, I made the 'id' column as np.array after reading it from some solution on stackoverflow.
s1 = data.id.values
s2 = np.array2string(s1, separator=',',suppress_small=True)
s2.split('_')
This gives me output as:
["['1",
"1193','1",
"661','1",
"914',..., '6040",
"161','6040",
"2725','6040",
"1784']"]
s2.split('_')[1]
gave me:
"1193','1"
what should I do to get the string after "_"?
Upvotes: 2
Views: 826
Reputation: 863801
You need vectorized str.split
with selecting second lists by str[1]
- also you can check docs:
data['a'] = data.id.str.split('_').str[1]
print (data)
user rating id a
0 1 3.5 1_1193 1193
1 1 3.5 1_661 661
2 1 3.5 1_914 914
3 1 3.5 1_3408 3408
4 1 3.5 1_2355 2355
print (data.dtypes)
user int64
rating float64
id object
a object <- format is object (obviously string)
dtype: object
#split and cast column to int
data['a'] = data.id.str.split('_').str[1].astype(int)
print (data)
user rating id a
0 1 3.5 1_1193 1193
1 1 3.5 1_661 661
2 1 3.5 1_914 914
3 1 3.5 1_3408 3408
4 1 3.5 1_2355 2355
print (data.dtypes)
user int64
rating float64
id object
a int32 <- format is int
dtype: object
Also if need replace id
column by new values:
data.id = data.id.str.split('_').str[1]
print (data)
user rating id
0 1 3.5 1193
1 1 3.5 661
2 1 3.5 914
3 1 3.5 3408
4 1 3.5 2355
data.id = data.id.str.split('_').str.get(1)
print (data)
user rating id
0 1 3.5 1193
1 1 3.5 661
2 1 3.5 914
3 1 3.5 3408
4 1 3.5 2355
Upvotes: 2
Reputation: 294586
A couple more options...
1
str.extract
df.id.str.extract('.*_(.*)', expand=False)
2
str.replace
df.id.str.replace('.*_', '')
Both Yield
0 1193
1 661
2 914
3 3408
4 2355
Name: id, dtype: object
Upvotes: 1