blogs - Mantra AI

Image detection and Augmented Reality tools

The ongoing hype regarding Augmented Reality has sparked an interesting conversation about the possibilities of Image detection. After Facebook F8 conference and Google I/O 2017 it becomes inevitable among tech giants to get in the game.

Though the VR has been available to everyone (Thanks to Google’s Cardbox), the concept of AR isn’t yet prominent among the public usage. When you think of Augmented Reality, one of the key elements to consider is object or image recognition technology, also known as Image detection.

Recently Microsoft with its Holo lens and Apple with it’s AR Kit spiked interest among everyone by instating it’s the future of the current technology.


Let’s have look at a couple of Augmented Reality frameworks for better understanding.



Kudan is a framework which works on marker less version of AR. With Kudan, whether it’s a Video, 3D Render or a Transparent Video, all you have to do is set it up with the related image as a trigger. You won’t need to go through the hassle of learning of another framework or programs such as Unity. Have a look at this example, the link of the video.

It also includes KudanAR toolkit for converting 3D object formats to ARModel format for the use. This tool kit supports 3D object formats such as FBX, OBJ, DAE which are pretty much easy to render in most of the tools available nowadays.

One good part about this framework is the media which is viewable from different angles and auto resumes when displaced and replaced on the trigger.


Vuforia has one of the best trackings and augmented solution out there. Vuforia we can use when we have to detect an image and have to show some 3D object and after that to play with that 3D object. For example, To build an app in which you have to detect an image and in that position, you want to show a lamp with on off feature and color change of lamp feature. Link to video.

From my experience, Vuforia is easy to make a marker, add a target with the marker on the scene and make whatever AR object(3D object should be in OBJ or FBX format.) you want to associate with the target as a child of the image target object.

Unity is a cross-platform game engine is used to develop games and Augmented reality apps. We can develop high-quality 3D and 2D games using unity.

One downside of the Vuforia is the lack of marker less AR support.

Both of these frameworks come with free developer license and have extensive usages and flexibility for the development. In the end, the selection of the framework to use depends on the flexibility, comfort and the requirements of the developer and the application.

Do you have any suggestions about what AR might be useful for? Please leave a comment!


The Five Myths of AI


Debunking AI Myths – AI is lot more than Algorithms and Robots

AI the complicated yet lucrative technology, the need for many and known to few. While AI is everything from machine intelligence to technology, it also has some common misconceptions like AI is the only solution that your industry needs or AI can think and reason by itself or how about AI is the next generation of the human machine? Which one do you believe?

Let’s dive in to clear off the dust from the myths that prevail about AI. Are you ready?

Myth One – AI is synonym to Algorithms

AI cannot work unless the data provided to them is structured, parsed and accurate. For example, a hypothesis that existed quite earlier that if we allow two different language dictionaries to talk we may create a translator. However, the execution only could happen when the two dictionaries were termed as big data, structure it and mash them to provide accurate results. So now if we have the algorithm to convert from Spanish to English but the data inputted is German and English it is bound to go wrong.

Myth Two – AI is to replace humans

Be it your marketing strategy or your virtual assistant, all solutions in the modern world could be AI enabled, but it never means they would lack the human touch. While automation and AI could replace human for less skilled jobs like folding papers or cleaning bin, a place for qualified and experienced professional still exists.

Myth Three – Moving things like Robots are AI

Do you think Drones or HDFC bank IRA is an AI solution? Well, these are a just machine that could move and work as expected from their design like Robotic assistance for Ira and travel for drones.

Myth Four – An AI solution could solve all your business problems

AI is not a magic wand that could eradicate all your problems from the design process to its delivery and maintenance. Well, AI could help at each stage but what is necessary to look for the quantity and quality of data at these steps, feed it to the algorithm apply some human intelligence and then get your problem solved.

Myth Five – AI solutions produces results instantly

How much time do you think you would take to learn two languages? Or how about a calculation of –

Courtesy – Softpedia

We are sure unless you are a great mathematician or a person with superb IQ you would need to work on it step by step. Similarly, AI tries to match human cognition that means training about data, building models and finally optimizing the algorithm. So, do not expect results to be quick.

However, the question yet remains what type of future do we need to build with AI? Self-destructive weapons or Self Treating Clinics? Super humans or Job Automation? Will the intelligent machines supersede us or coexists in harmony? Only time can tell

Reference Links:


Why use Python? : Advantages and features of Python

Python is a general-purpose language, which means you can build anything with the right tools and libraries. It is a dynamic, object oriented and multipurpose programming language which is designed to be easily learn, use, and to enforce a clean and uniform syntax.

Professionally, Python is great for backend web development, data analysis, artificial intelligence, and scientific computing. Many developers have also used Python to build productivity tools, games, and desktop apps.

Some of the biggest advantages are

  • Easy to Read & Easy to Learn
  • Very productive or small as well as big projects
  • Big libraries for many things

Some Key Features

1. Dynamically typed:

No need to ‘type declaration’ of a variable. Instead, you have variable names, and you bind them to entities whose type stays with the entity itself. a=5 makes the variable name ‘a’ to refer to the integer 5. Later, a= ‘hello’ makes the variable name ‘a’ to refer to a string containing “hello”. Statically typed languages would have you declare ‘int a’ and then a= 5 but assigning a= ‘hello’ would have been a compile time error.

2. Strongly typed:

It means that if a = “5”(the string whose value is ‘5’) will remain a string, and never coerced to a number if the context requires so. Every type conversion is explicitly done in Python.

3. Object Oriented with class-based inheritance:

Everything is an object (including classes, functions, modules, etc), in the sense that they can be passed around as arguments, have methods and attributes, and so on.

4. Multipurpose:

it is not specialized to a specific target of users (like PHP for web programming). It has extensible modules and libraries, that hook very easily into the C programming language.

5. Indentation:

There are no control braces in Python. Level of indentation identifies the Blocks of code. Although a big turn off for many programmers not used to this, it is precious as it gives a very uniform style and results in code that is visually pleasant to read.


 Precompiled code is portable between platforms. The code is compiled into byte code and then executed in a virtual machine.

What is Python Programming Language used for?

Users can easily use Python for small, large, online and offline projects. The best options for utilizing Python are web development, simple scripting, and data analysis.

Below are a few examples of what Python will let you do:

Web Development:

You can use Python to create web applications on many levels of complexity. There are many excellent Python web frameworks including, Pyramid, Django, and Flask, to name a few.

Data Analysis:

Python is the leading language of choice for many data scientists. Python has grown in popularity, within this field, due to its excellent libraries including; NumPy and Pandas and its superb libraries for data visualization like Matplotlib and Seaborn.

Machine Learning:

What if you could predict customer satisfaction or analyze what factors will affect household pricing or to predict stocks over the next few days, based on previous years data? There are many wonderful libraries implementing machine learning algorithms such as Scikit-Learn, NLTK, and TensorFlow.

Computer Vision:

You can do many interesting things such as Face Detection, Color detection while using Opencv and Python.

Internet Of Things With Raspberry Pi:

Raspberry Pi is a very tiny and affordable computer for education. It has gained enormous popularity among hobbyists with do-it-yourself hardware and automation. You can even build a robot and automate your entire home. Raspberry Pi can be used as the brain for your robot in order to perform various actions and/or react to the environment. The Possibilities are endless!

Game Development:

Create a video game using module Pygame. Basically, you use Python to write the logic of the game. PyGame applications can run on Android devices.

Web Scraping:

If you need to grab data from a website but the site does not have an API to expose data, use Python to scraping data.

Writing Scripts:

If you’re doing something manually and want to automate repetitive stuff, such as emails, it’s not difficult to automate once you know the basics of this language.

Browser Automation:

Perform some neat things such as opening a browser and posting a Facebook status, you can do it with Selenium with Python.

GUI Development:

Build a GUI application (desktop app) using Python modules Tkinter, PyQt to support it.

Rapid Prototyping:

Python has libraries for just about everything. Use it to quickly built a (lower-performance, often less powerful) prototype. It is also great for validating ideas or products for established companies and start-ups alike.

If you are new to programming, Python is the perfect choice for learning quickly and easily because the community provides many introductory resources.

You can get the detail list of organisations using Python is here


Amazon Cognitive Services

Amazon has launched three new cognitive services

  1. Rekognition – Object and facial analysis
  2.  Polly – Text into Speech
  3. Amazon Lex – Chatbot for voice and text


Amazon Rekognition is a service that makes it easy to add image analysis to your applications. 

 Four functions are provided in this API:

  • Object and Scene detection: Rekognition identifies various interesting objects such as vehicles, pets, or furniture, and provides a confidence score.
  • Image Moderation: It detects adult content in the image and provides suitable labels for the adult content detected.

          Cons : Does not classify images with violence/bloodshed as adult content.

  • Facial Analysis: You can locate faces within images and analyze face attributes, such as whether or not the face is smiling or the eyes are open with certain confidence scores.



  • Face Comparison: Rekognition lets you measure the likelihood that faces in two images are of the same person. Cons: The similarity measure of two faces of the same person depends on the age. Also localised increase in the illumination of face alters the results of face comparison.


Amazon Polly is a service that turns text into lifelike speech. Polly lets you create applications that talk, enabling you to build entirely new categories of speech-enabled products. 

  • 47 voices and 24 languages can be used and Indian English option is provided.
  • Tones whispering, anger, etc can be added to particular part of the speech using “amazon effects”.
  •  We can also instruct the system how to pronounce a particular phrase or word in a different way. Ex : W3C pronounced as World Wide Web Consortium. We can also give the input text in SSML format.

Amazon Lex is a service for building conversational interfaces into any application using voice and text.

Cons: There is no synonym option and there is not so proper entity extraction and intent classification. 

 Note: Amazon has not launched speech to text conversion API so far.


6 biggest keynotes from Apple’s WWDC 2017


Keynote event of Apple’s Worldwide Developer’s Conference (WWDC2017) where announces happened about iPhones, MacBooks, Apple TV, and more. News about what Apple’s doing with macOS, its hardware, iOS 11 and big features like augmented reality or launching a smart speaker.

Here are some highlights:

Apple Watch – OS 4 has Siri

An update for the Apple Watch is coming which introduces new faces that display different types of informations, such as

  • Automatically displays information based on routines and apps that you use. uses machine learning to display relevant alerts on the watch
  • Personalised activity notifications based on your exercise patterns.
  • Creates personalised goals and challenges for somethings you could not do earlier or came close to finishing

MacOS High Sierra

The latest version of macOS will be called High Sierra, and it comes with updates such as

  • Safari browser- help block site trackers
  • Control over autoplaying videos and  ads
  • Cookies to avoid being tracked.
  • Better search
  • Photos – added new photo-editing tools like curves, it has better filtering tools to sort images by keywords or faces.
  • Advance neural networks used for facial recognition.
  • Supports VR content creation libraries, SDK and engines.


  • Messages are on cloud and allows p2p payment integrated.
  • Siri  has deep learning with multiple ways to speech capabilities.
  • Allows translation of language.
  • Allows security while driving with drive mode using the  doppler bluetooth and wifi readings, intelligently identifying when you are driving to stop notification alerts.
  • Camera is improved with capability of having AR. able to identify surfaces and add objects to it. the objects interaction is taken care of like shadows based on added lights


It has predictive area that identifies what application you may want to use next. This is based on machine learning about your usage of apps searchable handwritten notes. The OS recognises what is written and allows searching wishing handwritten notes.


It has spatial recognition to allow the music quality to be updated based on the room it is being used in . Support of Siri and base home kit allows you to control home kit devices remotely.


iOS 11

You’ll be looking forward to in iOS 11

  • Now you can type to Siri in latest version
  • Create and capture GIFs now
  • Redesigned podcast app
  • QR code support

Stay tuned for more updates.


Difference between Face Detection, Face Recognition and Facial Analysis

Artificial Intelligence(AI) attempts to create a machine that simulate human intelligence to identify and use the right pieces of knowledge at the time of decision-making and solving problems. It deals with computational models that can think and behave like the way humans do.

Computer Vision is a super exciting part of Artificial Intelligence where we attempt to get intelligence out of visual data. Intelligence can be scene/object detection, face detection, face recognition, facial analysis.

Common misconception regarding face detection is to be highlighted. To be exact, let us try to understand the computer vision terms nicely.


1. Face Detection : Finding the faces (any) in an image/frame. It does not care about “whose face “. It just counts number of persons in the given image/frame. To know the number of persons in a conference/store, it can be used.

2. Face Recognition: Recognizing the face in an image/frame. It identifies/recognizes the face that face belong to X person. When you upload a picture on Facebook, you get recommendation regarding tagging your friends or yourself. That is the face recognition capability of Facebook.

3. Facial Analysis: Analysis of face in terms of age-group, sex, expression etc. It can help you know detailed information about your customers in a store if you use this capability.

Amazon (AWS) has launched Amazon Rekognition API to perform the above activities. IBM Watson offers visual recogntion API to perform the similar activities whereas Microsoft Azure has Face API to do it. There are other companies (service providers) that can offer the similar services in customized manner. Please explore and get the best out of the latest technologies.

Please feel free to share your comments.



7 best reasons to adopt Blockchain in your business

Blockchain has been in news for more than a couple of years. Many companies have gone ahead with the idea of implementing blockchain. Recently, Finance and supply chain companies have shown special interest in understanding and implementing blockchain. Other sectors including government bodies have shown interest in adopting blockchain. Even Dubai has shown keen interest in implementing Blockchain for all government activities.

Blockchain is the public ledger (and distributed database) that keeps records of transactions(between two parties transparently), called blocks, chronologically and publicly that is why it is transparent and immutable. Due to the peer-to-peer network and the distributed timestamping server, the public ledger (database) is managed autonomously. Smart contracts, automatic transactions etc can be easily programmed for fast actions.

Blockchain can be used for creating and maintaining these


A. Digital currency to be used in e-commerce, remittance, micro-finance and other similar contexts.


B. Smart contracts to be used in online shops, government and non-government deals


C. Securities to be used for debt,equity , crowd-funding


D. Record keeping to be used for hospitals, government bodies, voting, real estate and intellectual properties


You may be curious to know why you should implement Blockchain. Here are the reasons.


Reason 1 : Decentralized : It offers decentralization ( peer to peer networks). So, it is a good idea to go beyond paradigm of central system for anything and everything.


Reason 2 : Simplified Ledger : As all the transactions are added to a single ledger (blockchain), it avoids complications of multiple ledgers.


Reason3 : Less prone to viral attacks : Due to the decentralized networks, blockchain can easily withstand malicious attacks due to lack of dependency on the central system


Reason 4 : Fast : As the blockchain is programmatic, it takes a few moments to execute the transactions.


Reason 5 : Transparent and Immutable : Any change in public blockchain can be easily viewed by miners. so any transaction can not be altered or deleted.


Reason 6 : Automated : Since the details of contracts, deals, currency are well defined in programmatic eco-system, transactions are automated.


Reason 7 : Affordable : Unlike costly transactional costs offered by any central system such as banks, it can provide an eco-system of quite low transactional costs.

Ethereum is an open source, blockchain that anyone can use it as decentralized ledger. Corporates support for the Enterprise Ethereum Alliance (EEA).


Hope you have got some take-away points. I wish you the best for continuous learning.


IBM Cognitive Services: Speech API, Visual API and Language APIs – Part 2

This is the next and last part of my previous article. Little recap is here in case you have missed it.

In the previous part, I have talked about cognitive computing, services, and Watson API’s. I have already covered Speech/Voice and Visual API in the previous part. Now I am going to describe how Language Processing API does work.


1. Conversation

Conversation service helps to create an application that understands natural-language input and uses machine learning to simulate human and respond to customers. It is out of box solution and allows you to quickly build, test and deploy a bot or virtual agent across mobile devices, messaging platforms or physical robot. The conversation has a visual dialog builder to help you create natural conversations between your apps and users, without any coding experience required.

Supported Language: Brazilian Portuguese, English, French, Italian, Spanish, German, Traditional Chinese, Simplified Chinese, and Dutch. Arabic ,

2. Natural Language Classifier

Natural Language Classifier service uses machine learning algorithms to return the top matching predefined classes for short text inputs. The service understands the intent behind the text and returns a corresponding classification with a confidence score. It can help your application understand the language of short texts and make predictions about how to handle them. It can be used to answer questions in a call center, create chat bots, categorize volumes of written content and much more.
It involves following steps:

  • Prepare training data
  • Identify class labels
  • Collect representative text
  • Match classes to text
  • Create and train classifier
  • Use API to upload training data
  • Query the trained classifier
  • Use API to retrieve data
  • Evaluate results and update training data

Supported Language: English, Arabic, Brazilian Portuguese, French, German, Japanese, Italian, and Spanish.

3. Natural Language Understanding

Natural Language Understanding can analyze semantic features of text input, including – categories, concepts, emotion, entities, keywords, metadata, relations, semantic roles, and sentiment. It categorizes content using a five-level classification hierarchy. View the complete list of categories here.

Full tutorial for Natural Language Understanding
Watson Knowledge Studio can be used to customize annotation models, identify industry/domain specific entities and relations in unstructured text.

4. Retrieve and Rank

Retrieve and Rank helps us getting most relevant information from a collection of documents. Retrieve and Rank service combines two information retrieval components in a single service: the power of Apache Solr and a sophisticated machine learning capability. This combination provides users with more relevant results by automatically reranking them by using these machine learning algorithms.

How it works:

  • Collect and load content
  • Collect content
  • Modify and upload Solr configuration file
  • Upload content
  • Train the machine learning rank model
  • Collect queries and relevant answers to leverage as training data
  • Create and upload training data
  • Query service
  • Send runtime queries to trained model
  • Evaluate results and improve model

5. Tone Analyzer

Tone Analyzer service uses linguistic analysis to detect communication tones in written text. It helps understanding tone in three general categories: emotional, social, and language AND seven categories specific to customer service & support conversations – Sad, Frustrated, Satisfied, Excited, Polite, Impolite and Sympathetic. The score ranges from 0-2 for each category that indicates the probability of tone the content. It can be used to understand communications and respond to customers appropriately.

6. Personality Insight

Personality Insights service allows applications to extract personality of individuals from any content like – social media, enterprise data, or other digital communications like email, text messages, tweets, and forum posts. The service uses linguistic analytics to infer individuals’ personality characteristics, including Big Five, Needs, and Values. It can also find out individuals’ consumption preferences about various products, services, and activities. Personality Insights service can help businesses understand their customers or clients at a deeper level by understanding clients’ preferences. And it can be used to improve client acquisition, retention, and engagement, and to guide highly personalized engagements and interactions to customize their products, services, campaigns, and communications for individual clients.

7. Document Conversion

Document Conversion service can transform a single HTML, PDF, or Microsoft Word document to normalized HTML, plain text, or a set of JSON-formatted content that can be used by other services, like Retrieve and Rank, Personality Insight, conversation etc.

Supported language: English, French, German, Japanese, Italian, Brazilian Portuguese, and Spanish.

8. Language Translator

Language Translator translates text from one language to another. The service offers multiple domain-specific models that can be customize based terminology and language. The service can be trained and customized over time to provide better accuracy. Watson learns from previous translations. The service takes specific terms and phrases into accounts, such as the names of people or products to ensure that they are translated correctly.

Supported language:
News: Translate English to and from Arabic, Brazilian Portuguese, French, German, Italian, and Spanish. Spanish to and from French.
Conversational: English to and from Arabic, Brazilian Portuguese, French, Italian, and Spanish.
Patents: Targeted at technical and legal terminology. Brazilian Portuguese, Chinese, and Spanish to English.


Independently the services listed are just sufficient for few use cases but together it can solve large business problems and can challenge and disrupt many existing business models. In subsequent articles, we will share about Watson Knowledge Studio, Watson Discovery, and Watson Explorer. We will also share the real-time use cases of AI in multiple businesses.


IBM Cognitive Services: Speech API, Visual API and Language APIs – Part 1

Cognitive computing is becoming the next essential of business today. This is the next leap in the technology race.

First, let us understand what is Cognitive Computing and why this term has become the business essential lately.

“Cognitive Computing is the process of mimic the way the human brain works. It involves various artificial intelligence applications and machine learning algorithms such as Natural Language Processing, neural networks, virtual reality and robotics”

There are many players in the market who are providing cognitive capabilities and services. IBM Watson is one of them which provides the latest cognitive computing through its products and APIs.

The key enablers for this technology shift are:

  • Data: With IoT and Mobility has to lead to exponential growth of data
  • Computation power: In the era of cloud computing, computing power is not anymore at limit
  • Artificial Intelligence Algorithms: Many AI algorithms are available with years of research works by scholars.

Here are some most important Watson APIs, which we are using to provide services


Speech to Text

Speech to Text service let you add speech transcription capabilities to your applications. The service leverages machine intelligence to combine language grammar and language structure with knowledge of the composition of the audio signal. The service continuously returns and updates as more speech is heard. For multiple speakers in speech, It labels each speaker’s conversation.

It has the capability to returns alternative and interim transcription results. It also has the capability to introduce filtering to sanitize the output. It can convert dates, times, numbers, phone numbers, and currency values in final transcripts of US English audio into more readable, conventional forms.

Supported language: Brazilian Portuguese, French, Japanese, Mandarin Chinese, Modern Standard Arabic, Spanish, English.

Text to Speech

Text to Speech service uses speech-synthesis capabilities to convert written text to natural-sounding speech at real time(with minimal delay). The service accepts plain text or text that is tagged with the Speech Synthesis Markup Language (SSML), an XML-based markup language that provides annotations of text for speech. SSML with an expressive element that lets you indicate a speaking style emotional notation. SSML also let you control possible voices by controlling pitch, rate, and timbre. Text to Speech provides a customization interface that lets you specify how it pronounces unusual words that occur in your input.

Supported language: English, French, German, Italian, Japanese, Spanish, and Brazilian Portuguese. The service offers at least one male or female voice, sometimes both.


Visual Recognition

Visual Recognition service uses deep learning algorithms to analyze images for scenes, objects, faces, and other content. It understands the contents of images and can tag the image, find human faces, approximate age and gender, and find similar images in a collection. The response includes keywords that provide information about the content. A set of built-in classes provides highly accurate results without training. But it can be trained with custom classifiers to create specialized classes. Custom functionalities can be built around it like – to detect a product in a shop, identify damaged inventory, and much more.

Please go to this link for Language processing API .


Cognitive Computing : Hype or Reality?

Mankind has witnessed two eras of computers and are set to witness the third and the most powerful one. The most important era where computers will be used as cognitive systems is stepping in.

Cognitive computing tends to incorporate and stimulate in a computerised model, thought processes that are similar or close to that of the human brain.

For the computer to act as a cognitive system, it deep dives into various self-learning by mining data, processing natural languages and recognising patterns. The computer can go through these processes, replicate the way a human brain functions.

While there are certainly some functions where the computer has outshone the human mind like when it is asked to do some calculations or tabulations for that matter. Computers have not been able to master tasks that the human beings consider genuinely simple, tasks that are based on ‘common sense’. An example of this could be like trying to comprehend languages that people use to communicate with one another, or even identifying unique features in an object like a human would be able to do. So Artificial intelligence still has a lot to cover up to bring up a fully cognitive computational system in place.

So the cognitive computing processes can be made use of in any field , be it, medical, law, education, or even finance. the essence is that any field that can offer large amount of very complex to be analysed processed and made sense of, to evaluate and solve problems associated with the field.

Not only that, the cognitive systems can be of immense help to foster the cause of marketing function as a whole, particularly, involving consumer behaviour analysis, customer support bots, personal shopping bots etc. it can also be successfully applied by the travel agents, security personnel, tutors and even doctors to boost their respective domains.

Hilton Hotels, for example, recently introduced the first concierge robot, Connie who is a smart robot answering questions about the hotel, the sights in the vicinity of the hotel including local attractions in the natural communicating languages that humans use.

Human-like chatbots, Face Recognition, Face Detection, Object/logo detection, speech to text conversion, personal assistants are examples of Cognitive Computing and Services

One important takeaway from a new report out of MIT’s Center for Information Systems Research recently is that the cognitive computing tools today are best suited for carrying out tasks that are narrowly defined. Some of the use cases of the cognitive computing tools are their roles in helping banks evaluate the credit worthiness of their customers, conducting insurance audits for the health care providers, doing general audits for the finance and accounting firms.

It can act as a limiting factor for the popularity of AI is that any situation which has a high level of uncertainty, creativity or rapid changes cannot help AI flourish.

So to conclude, it can be said that cognitive computing technologies can be applied to such areas that have strong and systematic business rules in place,that can probably train the machines by guiding algorithms based on large volumes of data. It has a great future for almost all businesses. The faster you adopt it, the higher you can grow.

Older posts