Reputation: 119
I have a df:
col1
0 01139290201001
1 01139290101001
2 01139290201002
3 01139290101002
4 01139290201003
5 01139290101003
6 01139290201004
7 01139310101001
8 01139290201005
9 01139290301001
...
5908 01139ÅÊ21020
5909 01139ÅÊ21013
5910 01139ÅÊ11008
5911 01139ÅÊ21011
5912 01139ÅÊ03003
and I need to extract to a new column the first 7 numbers in the int
only cases and the first 5 and 8,9 numbers in the cases where characters are included.
I tried this code to a made up dataframe to try out ways to solve it and it worked but when I tried it on the actual dataset it didn't work as expected with the main reason being that my actual df
has integers and it did calculations on them.
df['col2']=df[col1][0:5]+df['col1'][8]
0 0113929020100101139290201005
1 0113929010100101139290201005
2 0113929020100201139290201005
3 0113929010100201139290201005
4 0113929020100301139290201005
5 NaN
6 NaN
7 NaN
8 NaN
9 NaN
also why it causes NaN values?
i want it to look like this:
01139290201001 to 0113929 for integer only rows and like this for the others
01139ÅÊ03003 to 0113903
Upvotes: 1
Views: 226
Reputation: 82785
Using .apply
Ex:
import pandas as pd
df = pd.DataFrame({"col1": ["01139290201001", "01139290101001", "01139290201002", "01139ÅÊ21020", "01139ÅÊ21013", "01139ÅÊ11008"]})
df["col2"] = df["col1"].apply(lambda x: x[:7] if x.isdigit() else x[:5]+x[9:11] )
print(df)
Output:
col1 col2
0 01139290201001 0113929
1 01139290101001 0113929
2 01139290201002 0113929
3 01139ÅÊ21020 0113921
4 01139ÅÊ21013 0113921
5 01139ÅÊ11008 0113911
Upvotes: 3