Months ago, I was among the first people to buy tickets to the first post-Corona Python conference here in Israel. I use Python extensively in my data science work as well as in some of my other projects so I thought this would be a great place to learn more about one of the tools I use every day. The good news is that it was. I learn about things at all levels, from some neat coding tricks, to how do some higher-level machine learning that is directly applicable to my job, and also about new packages and SaaS offerings that can or will play a role in work I do in the future. So, what were some of the highlights?
The day started off with a keynote panel (instead of a keynote speaker) discussing some of the pros and cons of varying levels of strictness when it comes to code review. It was interesting, but it isn’t really part of my world right now, plus nothing beats this keynote speech:
Supercharging Pipeline Efficiency with ML Performance Prediction
This was actually pretty cool. Mostly because it pertained directly to the work I am/will be doing in the world of Machine Learning Operations. This company that the presenters work at has clients who have various levels of data they need to process based on the client’s level of spending on online advertisements and as such their smaller clients were not able to always have their data processed by the company because the bigger companies would take too long and the smaller clients’ jobs would get cancelled as a result. So they applied machine learning (what else) to predict which client’s jobs would end up being large or small on any given day.
Apps to check out: Celery
Concepts to apply/research: serializing models, using AWS Athena to query S3 buckets
Detecting Anomalous Sequences
An employee from Paypal demonstrated to the room how to use various word vector models to weed out incongruous/suspect text in an effort to increase security on the Paypal platform. The presentation was rushed due to technical difficulties.
Concepts to apply/research: Word2Vec (this is a common topic at Python conferences), Bert, Autoencoders
Zero To Hero: Few Shot Learning + Multi-armed Bandit
Two trainers live coded some ML models which at the beginning was used to predict Boolean (True/False) answers to questions from a dataset, then was applied (I think) to help predict which of a set of theoretical slot machines was most prone to returning a sizeable jackpot. I actually learned a neat trick that you can increment a variable in a Python script by putting an if statement after the “+=”. I think that saves a line of code but may cost in terms of readability.
Concepts to apply: I work with True/False data every day, is there something I can use from this to predict my outcomes?
Monorepo – One Repo To Rule Them All
This presentation was fully scripted, which was a little weird to listen to, but overall interesting, as it showed how one can easily combine multiple repos into a single one and help integrate that into one’s CI/CD (until two weeks ago I had never heard of CI/CD, now it seems to be everywhere) processes. Presenter was clear about what the pros and cons of this approach are.
Packages mentioned: ToMono
Effective Protobuf: Everything You Wanted To Know, But Never Dared To Ask
Honestly, I have no idea what this was talking about. I need to research this more on my own but as it was presented it was way over my head.
Concepts to research: Protobuf
There is always another way: Sharpen your NumPy skills with the 8 Queens puzzle
This was a cool presentation. “8 queens” is apparently a famous puzzle and the presenter found five different ways to solve it with NumPy (apparently pronounced Num-Pie, even though English would tell you it is Num-Pee; I have discussed this on my Facebook page). I learned a ton of new functionality in NumPy that I may never use, but it is cool to know that you so easily manipulate dataframes and the like using this tool. The presenter said he read the documentation searching for functions that could help solve the problem (obviously reading the documentation is important, but I’m not sure I would read it like a book! Good for him!). In the end he was able to solve the problem with code that took less than 1 second to execute. Very cool.
Meet the Best Feature in Python 3.10: match-case
So Python 3.10 apparently has this new feature called match-case, which is basically a derivative of a switch-case that you may be familiar with if you ever did PHP programming. This one though seems specialized to detecting conditions involving objects. He used match-case to create a custom linter script that would highlight when he made unintended comparisons between Boolean values and tuples. Definitely a lot to look into here and it was cool to see how what he wrote actually triggered an error while he was writing code in VS Code.
Research: customizing linters, ast, Flake8, how to build a CLI
Exploring the Cheese Shop – What’s in the Python Package Index?
This was a high-level overview of how PyPI works and what pitfalls can befall you if you fail to be vigilant when using or updating various packages. Apparently, it is very simple to upload your own package to PyPI using just pip upload, who knew.
Research: Poetry, Twine
I learned A TON and hopefully I’ll have an opportunity to write some further blog posts on the follow up learning I did from the things I learned today. The presenters were knowledgeable, and I even had a good conversation with one later in the day who also happened to be a New England native, but that was about it in terms of the networking I was able to do today. There were no “semi-facilitated” networking sessions and I would say about 80% of the participants from what I could tell were attending with their team from their employer as opposed to individual contractors. The space is juuuust the right size right now, but they may want to consider a larger venue come next year. More to come tomorrow!