Python is the most used programming language in the field of Machine Learning, Big Data and Data Science. Discover everything you know about it: definition, advantages, use cases…
Created in 1991, the Python programming language emerged at the time as a way to automate the more boring parts of writing scripts or quickly prototype applications.
In recent years, however, this programming language has become one of the most widely used in the field of software development, infrastructure management and data analysis. This is a driving force behind the Big Data explosion .
- Python language: what is it?
- Python: its origins
- Python language: what are the main advantages?
- Python basics
- Python 2 vs Python 3: what are the differences?
- The Python Language for Big Data and Machine Learning
- Is Python suitable for beginners?
- Why do Data Scientists use Python?
- Python and Big Data: top of the best libraries and packages
- Learn Python with OpenClassrooms
- Version 3.9.7 available since August 2021
- Apple M1: Python natively compatible with macOS 11
- Google Atheris: an open source tool for finding Python bugs
- Python remains the most popular programming language in 2021
- Python: two vulnerabilities allowing remote code execution corrected by the PSF
- Python will overtake Java and C in the TIOBE Index for the first time
- Python 4.0 may never see the light of day, according to its creator
- Python could become 5 times faster in 5 years
Python language: what is it?
Python is an open source programming language created by programmer Guido van Rossum in 1991 . It takes its name from Monty Python’s Flying Circus.
It is an interpreted programming language , so it does not need to be compiled to work. An “interpreter” program allows Python code to be executed on any computer. This allows you to quickly see the results of a change in the code. On the other hand, this makes this language slower than a compiled language like C.
As a high-level programming language, Python allows programmers to focus on what they do rather than how they do it. Thus, writing programs takes less time than in another language. It is an ideal language for beginners.
Python: its origins
In the mid-1980s, a Dutchman named Guido van Rossum was working on an educational project. This consisted of creating a language for new coders, called ABC. During his participation in this initiative, Guido became interested in language design. He then started working on Python. He made unusual decisions that allowed this language to stand out from the zeitgeist of the time .
Notably, he decided to make the indentation meaningful. Some reviewers believe this would make the language difficult to use. But this characteristic partly explains why Python is both readable and popular. The style and readability of the code is improved thanks to the way the language can be written.
A big part of its design is to encourage developers to make good decisions. While indentation is built into Python, many other things aren’t. To write good code, you have to be a responsible coder. Unlike Java, Python ensures that code is not scolded if a variable or function has a particular name. Also, there is no need to define a type.
Python language: what are the main advantages?
The Python language owes its popularity to several advantages that benefit beginners and experts alike. First, it’s easy to learn and use . Its features are few, which makes it possible to create programs quickly and with little effort. In addition, its syntax is designed to be readable and straightforward.
Another advantage of Python is its popularity. This language works on all major operating systems and computer platforms. Also, while it’s clearly not the fastest language, it makes up for its slowness with its versatility.
Finally, even if it is mainly used for scripting and automation, this language is also used to create professional quality software . Whether applications or web services, Python is used by a large number of developers to create software.
First, Python does not use a semicolon to terminate lines , unlike most programming languages. A new line is sufficient for the interpreter to detect a new command.
Most languages use braces to define the scope of a block of code , but Python’s interpreter simply determines it by indentation. This means that you have to be especially careful with white spaces in the code, which can break the operation of the application.
To comment out something in your code, just use the hash #.
With python, it is possible to store and manipulate data in a program. A variable stores data such as a number, username, password, etc. To create (declare) a variable, just use the = symbol .
To store data in Python, you need to use variables . Now, with every variable, there will be a data type. Strings, integers, Booleans, and lists are examples of data types.
- A boolean type can only contain the value True or False.
- An integer is one of three numeric types, including floats and complexes. An integer is a positive or negative whole number.
- A string is one of the most common data types.
Operators are symbols that can be used in values and variables to perform comparisons and mathematical operations.
- ==: equal! = : not equal
- <: less than
- <=: less than or equal to Arithmetic operators:
- + : addition
- – : substraction
- *: multiplication /: division **: exponentiation %: modulus, gives the remainder of a division.
Python 2 vs Python 3: what are the differences?
There are two versions of Python: Python 2 and Python 3 . There are many differences between these two versions. Python 2.x is the old version, which will continue to be supported and therefore to receive official updates until 2020. After this date, it will undoubtedly continue to exist in an unofficial way.
Python 3.x is the current version of the language. It brings many new and very useful features, such as better concurrency control and a more efficient interpreter. However, adoption of Python 3 has long been held back by the lack of supported third-party libraries. Many of them were only compatible with Python 2, which made the transition complicated. However, this problem is now mostly solved and there are few good reasons left to continue using Python 2.
The Python Language for Big Data and Machine Learning
The primary use case for Python is scripting and automation . Indeed, this language makes it possible to automate interactions with web browsers or application GUIs.
However, scripting and automation are far from the only uses of this language. It is also used for programming applications , for creating web services or REST APIs, or for metaprogramming and code generation.
In addition, this language is also used in the field of data science and Machine Learning . With the rise of data analytics across all industries, it has become one of its primary use cases .
The vast majority of libraries used for Data Science or Machine Learning have Python interfaces . Thus, this language has become the most popular high-level command interface for Machine Learning libraries and other digital algorithms. Many introductory books are available on the Web.
Finally, companies specializing in robotics such as Aldebaran use this language to program their robots . The company acquired by Softbank chose this programming language in order to facilitate the design of applications by third-party companies and amateurs.
Is Python suitable for beginners?
Python can be considered beginner-friendly. Indeed, this programming language favors readability, which makes it easier to understand and use. Its syntax has similarities with the English language. Because of this, it makes it easy for novice programmers to get started in the world of development.
Python is also a flexible and dynamically typed language . This means that the rules are not strictly defined, which makes it more intuitive. It is also a more forgiving language, able to work with some level of errors.
In fact, ease of use was one of Python’s founding principles when it was created in 1989 by Guido van Rossum (and later released in 1991). Python’s original goal was to make programming easier , with an emphasis on code readability. It can run on various platforms such as Windows, Linux and Mac OS, and it is free software.
A beginner will need around 6-8 weeks to learn the basics of Python. It takes this time to learn to understand most lines of code in Python. It would take much longer to learn Python to start a new career as a Python developer.
Why do Data Scientists use Python?
Python is the most used language for Data Science . For good reason, this language is simple, readable, clean, flexible and compatible with many platforms. Its many libraries, such as TensorFlow, Scipy and Numpy allow to perform a wide variety of tasks.
Thus, according to a survey conducted in 2013 by O’Reilly, 40% of Data Scientists use Python on a daily basis . Its very simple syntax makes it usable by people who do not necessarily have an engineering background.
It enables rapid prototyping, and code can be run anywhere : Windows, macOS, UNIX, Linux… its flexibility allows it to support Machine Learning model development, data mining , classification and more. other tasks faster than other languages.
Libraries like Scrapy and BeautifulSoup help extract data from the internet, while Seaborn and Matplotlib help with Data Visualization. For their part, Tensorflow, Keras and Theano allow the development of Deep Learning models , and Scikit-Learn helps in the development of Machine Learning algorithms.
Python and Big Data: top of the best libraries and packages
If Python has established itself as the best programming language for Big Data, it is thanks to its various data science packages and libraries . Here are the most popular.
Pandas is one of the most popular data science libraries. It was developed by Data Scientists accustomed to R and Python , and is now used by a large number of scientists and analysts.
It offers many very useful native features . In particular, it is possible to read data from many sources, to create large dataframes from these sources, and to perform aggregated analyzes based on the questions to which one wishes to obtain answers.
Visualization functionalities also make it possible to generate graphs from the results of the analyses, or to export them in Excel format. It can also be used for the manipulation of numerical arrays and time series .
Newer than Pandas, Agate is also a Python library designed to solve data analysis problems. In particular, it offers functionalities for analyzing and comparing Excel tables, or performing statistical calculations on a database.
Overall, Agate is easier to learn than Pandas . In addition, its data visualization features allow you to easily and quickly visualize the results of the analyzes.
Bokeh is an ideal tool for creating visualizations of datasets . It is possible to use it together with Agate, Pandas and with other data analysis libraries.
It is also possible to use it with pure Python. This tool allows you to create excellent charts and visualizations without the need for excessive coding .
NumPy is a package used for scientific calculations in Python. It is ideal for operations related to linear algebra, Fourier transformations, or random number crunching.
It can be used as a multi-dimensional generic data container. Plus, it easily integrates with many different databases.
Scipy is a library for technical and scientific calculations . It includes modules for data science and engineering tasks such as algebra, interpolation, FFT, or signal and image processing.
Scikit-learn is very useful for classification, regression or clustering algorithms such as decision tree forests, gradient boosting, or k-means.
This Machine Learning library for Python is complementary for other libraries such as NumPy and SciPy .
PyBrain is actually an acronym for Python-Based Reinforcement Learning, Artificial Intelligence, and Neural Network Library. As its name suggests, therefore, it is a library offering simple but powerful algorithms for Machine Learning tasks .
It can also be used to test and compare algorithms using a variety of predefined environments.
Developed by Google Brain, TensorFlow is a Machine Learning library . Its data flow graphs and flexible architecture allow data operations and calculations to be performed using a single API across multiple CPUs or GPUs from a PC, server, or even mobile device.
Other Python libraries include Cython which converts code to run in a C environment to reduce runtime. Similarly, PyMySQL allows connecting a MySQL database, extracting data and executing queries. BeautifulSoup can read XML and HTML data. Finally, the iPython notebook allows interactive programming.
Learn Python with OpenClassrooms
This course is divided into five parts . After a complete introduction to Python, you will learn to master object-oriented programming on the user side, then on the developer side. You will then discover the standard library, then the course concludes with some additional appendices.
The advantage of the OpenClassrooms solution is that it is free, accessible to beginners, and that it allows you to progress at your own pace. In addition, once the training is completed, you can receive a certification recognized by professionals provided you pass the test exercises.
Some resources to learn the python language on your own
Several people have uploaded Python learning PDFs or videos for beginners. If you are more of the self-taught type, these resources may be for you. For those who appreciate the video format, Dominique Liard has published a series of videos on YouTube to learn Python.
Version 3.9.7 available since August 2021
The Python Software Foundation has made version 3.9.7 of the eponymous language available . This is the sixth maintenance since the major update to version 3.9 in October 2020.
Apple M1: Python natively compatible with macOS 11
In December 2020, Core Python developers released version 3.9.1 of the Python language. This is the first release natively compatible with macOS 11 Big Sur, on Apple’s new Arm-based M1 chip .
The Core Python teams have developed an experimental installer called macos11.0 . Thanks to Xcode 11, it is possible to create Universal 2 binaries running on Apple Silicon chips.
Binaries can be developed on current versions of macOS, and deployed on older versions of the operating system. It is therefore a relief for Data Scientists , following Apple’s decision to change its architecture.
Google Atheris: an open source tool for finding Python bugs
Google security experts have “open-sourced” the Atheris tool. This allows you to find security bugs and vulnerabilities in Python code in order to fix them before it’s too late.
This tool is based on the “fuzzing” technique . This concept consists of feeding an application with a large amount of random data, and analyzing the result to detect possible crashes or anomalies. Developers can then search for bugs in the application code.
This new tool joins the list of different “fuzzers” deployed by Google in open-source since 2013: OSS-Fuzz, Syzkaller, ClusterFuzz, Fuzzilli or BrokenType. However, these previous solutions made it possible to discover bugs in C or C++ applications.
While Python is now the 3rd most used language according to the TIOBE index, Google is responding to growing demand with Atheris. The tool, originally developed during an internal hackathon in October 2020, allows code fuzzing in Python 2.7 and 3.3+ or native extensions created with CPython. .
Python remains the most popular programming language in 2021
There are more and more programming languages , so much so that it becomes difficult for developers to choose which one to learn to use to pursue their careers.
Through its new report ” Where Programming, Ops, AI, and the Cloud are Headed in 2021 “, O’Reilly reveals which are the most popular languages at the dawn of 2021. The analysts based themselves on data from training online from O’Reilly, those of its partners and its virtual events.
This year again, Python remains the most popular language . Interest from developers is even up 27% compared to the previous year.
We can see that this enthusiasm is largely linked to the advantages of Python for Machine Learning. Indeed, usage of the scikit-learn library increased by 11% . The PyTorch framework, used for Deep Learning, has seen its adoption increase by 159%.
Rust could become the language of choice for system programming, namely the creation of new OS and tools for cloud operations. Similarly, Go has established itself as a key language for concurrent programming.
Another trend identified by O’Reilly is the adoption of “low-code” or “no-code” programming, allowing people without computer coding skills to build applications using tools and software. intuitive graphical interfaces.
However, professional developers are not likely to find themselves unemployed. New languages, libraries and tools used for this type of programming will always require experienced developers to create and maintain them.
Artificial intelligence and machine learning also continue their ascent. Developer interest in AI increased by 64%, compared to 14% for ML. Natural language processing, meanwhile, is up 21%. The most popular platform for Machine Learning is TensorFlow, with a 6% uptick in interest over 2020.
More and more developers want to train in the use of Cloud Computing . Interest in AWS has increased by 5% in one year. Amazon’s cloud therefore remains the most popular, but the surge in interest in Microsoft Azure has exploded to 136%.
On the Google Cloud side, the increase reached 84%. This trend demonstrates that more and more companies are migrating their data and applications to the cloud .
Finally, the adoption of online training has increased by 96%. This is not a surprise, since the COVID-19 pandemic prevents face-to-face training. Usage of training books also increased by 11%, compared to 24% for educational videos…
Python: two vulnerabilities allowing remote code execution corrected by the PSF
In early 2021, two vulnerabilities affecting current versions of Python were discovered. The “CVE-2021-3177” flaw impacted the buffer and could lead to remote code execution in Python applications.
Fortunately, in a post published on its blog , the PSF specifies that remote code execution requires many conditions. However, this vulnerability allows launching DDoS attacks. A cyberattacker could overwhelm the buffer in order to crash an application. The second vulnerability, CVE-2021-23336, allowed poisoning the web cache.
Following the discovery of these flaws, the Python Foundation fixed the two bugs with the release of Python 3.8.8 and 3.9.2. It is therefore important to update the version of Python you are using in order to remove this security threat.
Python will overtake Java and C in the TIOBE Index for the first time
Every month, TIOBE publishes a ranking of the most used programming languages. Over time, this monthly ranking makes it possible to observe trends in the field of coding.
The rating system, in percentage, is based in particular on the volume of searches carried out on Bing, Amazon, YouTube, Wikipedia, Google, Yahoo and Baidu for each programming language.
In June 2021, the C language occupies the top of the ranking with a score of 12.54%. However, this rating represents a decrease of 4.65% compared to June 2020.
However, Python is in second place with a score of 11.84% . The difference between these two languages is therefore only 0.7%. Python’s rating has increased by 3.48% over the past twelve months.
We then find Java in third place with a score of 11.54% , or 4.56% less than in June 2020. At the time, Java was in second position.
According to Paul Jansen, CEO of TIOBE Software, Python will reach the top spot in the rankings very soon . This rise could come in July 2021, when the TIOBE index itself will celebrate its 20th anniversary.
During these two decades, the first place was monopolized only by C and Java. Python’s dominance would therefore represent a historic turning point in the history of computing …
The top 10 is closed by SQL , with a rating of 1.88%. This represents an increase of 0.15% compared to June 2020.
Outside the top 10, Classic Visual Basic has moved up eight places in one year. Number 12, Groovy, moved up 19 places and number 17, Fortran, moved up 20 positions. On the other hand, R and Swift have lost five places each and fall to position 14 and 16 respectively. The top 20 is closed by MATLAB which has lost four places and Go which has lost eight.
Promising languages for the future include Dart, Kotlin, Julia, Rust, TypeScript, and Elixir . For now, these recent languages are still far from the top and have not really moved in the rankings over the past year.
Over the first six months of 2021, the global developer community has enjoyed unbridled growth. This is highlighted by a report published by SlashData .
According to this study, there are currently 24.3 million developers worldwide in the first quarter of 2021. This is an increase of approximately 14% compared to the 21.3 million recorded in October 2020.
In second place, we find Python with a community of 10.1 million developers . This community is growing at a rate of 20%, which is the highest growth rate of any programming language.
According to the report, the popularity of Python is largely linked to the rise of Data Science and Machine Learning . Indeed, nearly 70% of Data Scientists and Machine Learning developers use Python. In comparison, only 17% use R.
In the ranking of the largest communities, we find Java next with 9.4 million developers, C/C++ at 7.3 million, and C# at 6.5 million. Android ‘s Kotlin language just overtakes iOS Swift , with 2.6 million and 2.5 million developers, respectively.
Python 4.0 may never see the light of day, according to its creator
According to Guido Van Rossum, the creator of Python, version 4 of the language may never see the light of day. This is mainly related to the many difficulties encountered during the migration from Python 2.0 to Python 3.0 in 2008.
Asked about this during an interview with Microsoft Reactor, Van Rossum explained that neither he nor members of the core team of Python developers are motivated to release a 4.0 version following the many setbacks encountered during the previous one. major update.
Since Python 3 is not compatible with Python, developers who created dependencies of software libraries based on Python 2 could not update them to Python. A long period of migration followed, which extended over several years and left a bitter memory to the creator of the language. As a reminder, the Python 2 lifecycle ended in April 2020 with version 2.7.18.
The only reason Python 4.0 would see the light of day would be a major change in compatibility with C. The update would then be essential.
Other than that, Python will continue to follow a strict annual release schedule. 3.x versions will continue up to 3.99, then another digit will be added after the decimal point if needed.
Python could become 5 times faster in 5 years
Despite its many good qualities, one of Python’s main weaknesses is its slowness . Compared to C++ or Java, this interpreted language with a high level of abstraction is significantly slower.
However, things could change with future releases. During the Python Language Summit, Guido Van Rossum, creator of the language, announced that the speed will be doubled with version 3.11 expected in October 2022.
And that’s just the beginning. A new version will be rolled out every year, and the current speed is expected to increase fivefold within five years .
In a presentation posted on GitHub , Van Rossum explains how he plans to achieve this feat. An adaptive interpreter, frame stack optimization, and “zero overhead” exception handling are among the avenues being considered.
Other changes are to be expected, such as an ABI (Application Binary Interface) or an automatic code generator to continue to speed up Python. Thus, speed now seems the top priority for the creators of Python.
Source French site lebigdata.fr