The current invitation (May2023) is for T-SQL Tuesday #162. The next invitation should be released on June 6.
T-SQL Tuesday #162 – Data Science in the time of ChatGPT
Invitation from Tomaz Kastrun.
Instead of writing and asking Data science questions, let’s discuss the aspects of Data science with the presence of Chat GPT 4.0.
By now, it is known to everyone that Chat GPT is a language model (LLM – Large Language Model) that is based on the GPT (Generative Pre-trained Transformer) architecture. It uses deep learning algorithms to like neural nets with billions of weights and transformers, that generated the sequence of tokens, that make up a piece of text.Transformers introduce the concept of “paying attention” to generally build better sequence of text. It operates primarily with probabilities of words and their sequence and therefore it is also good for human-like responses to natural language queries, making it great for a conversation-like experience.
There are many of the caveats hidden in the processing of text, adjustments of weights, functions (different and tweaked versions of Relu), additional corpora and billions of text for model training and many additional texts.
I have prepared two groups of questions. And I will not go into debate, as the end of data science is near, nor will go into debate, that the AGI (artificial general intelligence) will completely replace the role of data scientists. What I want to hear from you is simply how did you embrace (if at all) the use of Chat GPT, and what were your first impressions. And mostly, how did it help you (if at all), what did you use it for, and have you encountered any traps?
Usage and working along Chat GPT
Imagine using SQL, R, Python, Julia, or Scala, for your daily data science work. And you can practically ask Chat GPT anything and it will return you a relatively coherent and good answer. If you need an explanation, it will excel. Where and what have you used it for? Here is a short list, that might get you started:
- Explain the data science algorithn?
- Help tune or create SQL code to query big data
- Prepare R, Python, Scala code for exploring the data
- Help you prepare the training of the model in desired language
- Prepare the code for hyperparameter tunning and cross-validation
- Ask for data visualization for given dataset
- Help create dashboard
- Create code for model deployment, model re-training or model consumption
- Ask for preparing custom functions and algorithm/function adjustments?
Now, that you have added and found the list of where and how it did help you, I would like to understand, how did this help you? Feel free to make a general comparison and add some explanations. And lastly, of course, add, if this has in any kind of way compromise your work as a data scientist (in a term of embracing it in – a positive way, or in terms of a negative experience).
We have seen many controversies around Chat GPT emerge. Some European Union countries have banned it, and some will so be doing it too. And the question is not only its use (as the end of humanity and empathy) but also the misuse of personal data, privacy issues and leaking of relevant, corporate information.
Have you considered responsible usage of Chat GPT? Here is again the short list for helping you:
- The use of personal data retrieved from the model
- Inserting sensitive (personal or company) data
- Explaining the section of R, Python, Scala code, that is the property of your enterprise
Instead of this, have you tried using it more responsibly:
- Using pseudo code for explanation of the algorithm
- Using mock data rather than real data
- Giving pseudo-code in order to receive the documentation
- Skipping on sensible data (SQL schema, model information, sensible data)
So which cases have you come across? Did it have any consequences for you? Which other responsible use of Chat GPT have you also done?
ChatGPT offers interesting answers (based on my experience and search), and it is the next step from a google search of Stackoverflow. In other words, it gives you a more focused answer. When exploring and searching forums, you might find several different solutions for a single problem, whereas here, you have to ask for another solution. And respectively, it can give you answer faster, in comparison to browsing the web. In both cases, both sides have their advantages and disadvantages, but non will assure you, that the answer is correct!
I embrace this technology as an additional learning source. But I personally do not use it as my daily driver, despite trying it out a couple of times (with mixed results; working and nonworking/useless/meaningless). It can be super helpful for entry/junior positions, but the more experienced you are, the more abstract data science work you and the more complicated topics you cover, less frequently you will presumably use it.