Reputation: 1886
I am trying to extract and split a column of data into two new columns. This is reasonably simple using .str[xx:yy]
However, I am getting an error when I am trying to complete the same after having run a .groupby(["xxx", "yyy", "zzz"])[["aaa"]].count()
I get told that KeyError: 'Column name'
As an example, after running the .groupby command, I get the following table:
DevReg_df = DevReg_df.groupby(["Dev Loc", "Dev Model", "Dev OS"])[["Number"]].count()
+---------+-----------+--------+--------+----------+
| Dev Loc | Dev Model | Dev OS | Number | |
+---------+-----------+--------+--------+----------+
| ar_ar | Sam 22 | A 5 | 2 | |
| | Sam 4 | A 6 | 5 | |
| az_ae | i 1 | i 2 | 6 | |
| | i 34 | i 9 | 98 | |
+---------+-----------+--------+--------+----------+
Running the following code afterwards gets me:
DevReg_df["Language"] = DevReg_df["Dev Loc"].str[:2]
KeyError: 'Dev Loc'
The outcome of the table I am expecting is:
+---------+-----------+--------+--------+----------+
| Dev Loc | Dev Model | Dev OS | Number | Language |
+---------+-----------+--------+--------+----------+
| ar_ar | Sam 22 | A 5 | 2 | ar |
| | Sam 4 | A 6 | 5 | ar |
| az_ae | i 1 | i 2 | 6 | az |
| | i 34 | i 9 | 98 | az |
+---------+-----------+--------+--------+----------+
Edit: Reason that I don't want to do the commands in the reverse order (before the groupby) is that there are more than 30m rows which the groupby reduces to less than 100k. Doing the other way causes large delays in completing the script.
Upvotes: 0
Views: 75
Reputation: 173
I think using apply and then splitting the series item might solve what you are looking for!! Try this
DevReg_df["Language"] = DevReg_df["Device Loc"].apply(lambda x:x.split('_')[0])
Upvotes: 1